TEXTBOOKS IN MATHEMATICS 


AN INVITATION TO 
ABSTRACT ALGEBRA 


Steven J. Rosenberg 


+ 


<4 
Smt 


: 


CAP 


YY f 


CRC Pres 


Taylor & Francis Group 


ot 


A CHAPMAN & HALL BOOK 


An Invitation to 
Abstract Algebra 


Textbooks in Mathematics 
Series editors: 
Al Boggess, Kenneth H. Rosen 


Functional Linear Algebra 
Hannah Robbins 


Introduction to Financial Mathematics 
With Computer Applications 
Donald R. Chambers, Qin Lu 


Linear Algebra 
An Inquiry-based Approach 
Jeff Suzuki 


The Geometry of Special Relativity 
Tevian Dray 


Mathematical Modeling in the Age of the Pandemic 
William P. Fox 


Games, Gambling, and Probability 
An Introduction to Mathematics 
David G. Taylor 


Linear Algebra and Its Applications with R 
Ruriko Yoshida 


Maple™ Projects of Differential Equations 
Robert P. Gilbert, George C. Hsiao, Robert J. Ronkese 


Practical Linear Algebra 
A Geometry Toolbox, Fourth Edition 
Gerald Farin, Dianne Hansford 


An Introduction to Analysis, Third Edition 
James R. Kirkwood 


Student Solutions Manual for Gallian’s Contemporary Abstract Algebra, Tenth Edition 
Joseph A. Gallian 


Elementary Number Theory 
Gove Effinger, Gary L. Mullen 


Philosophy of Mathematics 
Classic and Contemporary Studies 
Ahmet Cevik 


An Introduction to Complex Analysis and the Laplace Transform 
Vladimir Eiderman 


An Invitation to Abstract Algebra 
Steven J. Rosenberg 


https://www.routledge.com/Textbooks-in-Mathematics/book-series/CANDHTEX- 
BOOMTH 


An Invitation to 
Abstract Algebra 


Steven J. Rosenberg 


CRC Press 
Taylor & Francis Group 
Boca Raton London New York 


CRC Press is an imprint of the 
Taylor & Francis Group, an informa business 


A CHAPMAN & HALL BOOK 


Cover: Evariste Galois viewed through a kaleidoscope. Original Galois portrait © Emily Koch; 
kaleidoscope software by Steve J. Rosenberg. 


First edition published 2022 
by CRC Press 
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 


and by CRC Press 
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN 


© 2022 Steven J. Rosenberg 
CRC Press is an imprint of Taylor & Francis Group, LLC 


Reasonable efforts have been made to publish reliable data and information, but the author and pub- 
lisher cannot assume responsibility for the validity of all materials or the consequences of their use. 
The authors and publishers have attempted to trace the copyright holders of all material reproduced 
in this publication and apologize to copyright holders if permission to publish in this form has not 
been obtained. If any copyright material has not been acknowledged please write and let us know so 
we may rectify in any future reprint. 


Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, 
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or 
hereafter invented, including photocopying, microfilming, and recording, or in any information stor- 
age or retrieval system, without written permission from the publishers. 


For permission to photocopy or use material electronically from this work, access www.copyright. 
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis- 
sions@tandf.co.uk 


Trademark notice: Product or corporate names may be trademarks or registered trademarks and are 
used only for identification and explanation without intent to infringe. 


Library of Congress Cataloging-in-Publication Data 


ISBN: 9780367748616 (hbk) 
ISBN: 9781032171784 (pbk) 
ISBN: 9781003252139 (ebk) 


DOI: 10.1201/9781003252139 


Publisher’s note: This book has been prepared from camera-ready copy provided by the authors 


To my wife Eleni, 
without whom I would have been eaten by wolves long 
ago. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


Contents 


Preface xi 
Author xiii 
Symbols xv 
1 Review of Sets, Functions, and Proofs 1 
Lek ERS 6 See eee oS SES ES oS CEERE DSS ERR ESS 1 
1.1.1 Some Special Sets of Numbers............. 1 

Tl Deésembing a 360 . 44 ee ee eed wee eH HHH A 2 

Lido Operations on bets . 4.4 ccheteaeceec ace geens 4 

We Pues: 24664 4ee R428 eo eee EE ESSE REE BE 5) 

Lo. PEO 244548 48555554448 RHR ESSE HS GSO 8 
Dt. TN ae ai ow SSD ee Es, hw 8 

132 Proof Convettions .4.4.:i2+ 2244828488 10 

14 Howto Read This Book 2.4.44.4:ic444 44842444 12 

2 Introduction: A Number Game 15 
2.1 A Game with Integers ................0000.4 15 
US Bier ame 2 csc dG GEG ERR RE SHEARER RRDS 16 
Zo Concluding Remarks 22... 44a eee ee OS 18 
2.4 Exercises 244368 64 2464 bE eS OR OKA ORES 19 

3 Groups 21 
oil: LOOMOON. oa dG oe AROSE © Oe od 6 wo EIDE AG 21 

me Mare Cpertitntie 2 44885448 RE ee SSG EEE EERS 21 
ee Groups: Definition and Some Examples ........... 23 
3.4 First Results about Groups .................. 28 
3.5 PRCtUGes ni dwd ee saad bOeG bbe EK KREEEBS 34 

4 Subgroups 37 
4.1 raupe insite Crue 4 a aa a. ek ee Be a 37 
4.2 The Subgroup Generated bya Set .............. 40 
4.3 EMGIises: nix Geese eadadat Gh hbs dd Kk C4544 503 43 


vil 


viii Contents 


5 Symmetry AT 
5.1 Whatis Symmetry? .......0. 0.000000 ee eee A7 
S2 Dihedesl Groups 1 nce ic cs Gaeeaaveccvivocann 
Se | eres. 6 ee OR BSS KE EEE ESS EEE 5a. 


6 Free Groups 55 
6.1. The Free Group Generated by a Set ..... 4 do. G@ ow A 55 
6.2 Exercises ............... be de de dE ee ch tic BH 56 


al Re ‘ ips between Groups ........-...-.-24- 57 
7.2 Kernels: How Much Did We Lose? ...... ee 60 
7a  Cosets ....4445 err ee ee ee eT ern a ee 64 


to Exercises Sete ede ga Ga ay doa we, Se re rene 69 


8 Lagrange’s Theorem 73 
Sl) || ose arid Parties 2 ee eke SSS KER EES 73 
8.2 The Size of Cosets ee ee 
8.3 Reaping the Consequences fet ates Ree este patil gph ape api cde de 4 77 
8.4 Exercises ...... he Se ee Rw te th ae tae eR 78 


9 Special Types of Homomorphisms 81 
9. I 6 Is@rmiotphisite .44 ak eed a ww ew ee are 81 
A hoeanep tis gebe be tie erode ofc of daugh ribn dabe de euch dard de ede 85 


11.3 Ring fomomorphisms, Licals. and Quotient Rings sededned 103 
TA: Biers 4 ee hee RR REE Ee RRR RR 110 


21 ee < seed fe ee eee 
12.2 Primes and iuuaaian goth she oh Doe, i deh HH. & Aadeede dededoa: 4 113 
12.3. The Ideal Generated bya Set ................. 115 
124 Fields.and Maximal Idedle. . i... 4444 444 eb dos 119 
Po Peres. pick eG hee GSGRGbEE ee eee EEE eG LEB 


Contents ix 


13 Vector Spaces 125 
131 Unirediichion: 4.4.).2e2444 4002444 e Hee ddd SEH 125 
13.2 Abstract Vector Spaces ............-..2..-... 126 
13.3. Bases: Generalized Coordinate Systems ........... 129 
le verneee 234.5 Ae ew 6 Oe ee Se 136 


14 Polynomial Rings 141 
14.1. Polynomials Over a Commutative Ring 
14.2 Polynomials Over a Field 
I4:3 ExXSCiGs: pieced eeededeeGiddediaegweianags LD 


15 Field Theory 153 
15.1 Extension Fields 
15.2 Splitting Fields ro 
15.3 Exercises ...... oe hota ta Se A ha ene Serene? 165 


16 Galois Theory 169 
16.1 Bisid Baibeddings 224.4484 864.450.9444 Oe 4 O44 169 
16.2 Separable Extensions ........ bie doe domes ke ge ge 
16.3 Normal Extensions ... 1... 66 ee ee 176 
16.4 Galois Extensions <.4 46662465545 4 he be tbe la ae pests es 
16.5 Exercises ...... eh dp ds ede Be One a haa os 182 


17 Direct Sums and Direct Products 185 
IW. Introduction. ..4466644 46664 b oe HOR HHH SES 185 
17.2. Direet Products ..444444444 bere eegasacans, 1 
17.3. Direct Sums ...... Sh Ss ee a 190 
UiA WRGPCISGS ck ke ee Khe EEE ERE EERE e eee See ee «TOA 


18 The Structure of Finite Abelian Groups 197 
18.1 Introduction Sure Fi Sh he sh ie ce, ttt ig eer er rte ee 197 
18.2 i 
18.3 
18.4 201 
18.5 L be de & & a 207 
18.6 Brensies 5 Ap seb OeG: Ge Se eS ee HE SA he 208 


199 


19 Group Actions 211 
19.1 Groups Acting on Sets ..........00. he ds SR ae 211 
19.2 Reaping the eli ciaiaiad ,% ht Bd bs oes a 215 
Ife Eveiges pasadadegaedaaciticdioerechanaway GIy 


xX Contents 


20 Learning from Z 223 
O01 WrOductiON: 4-4-4 ee 4 ah RES eM 223 
S03, -ERoeOne: odducs eee d G44¢G¢ OHSS eee HESS TER 
20.3 Unique Factorization 24.4.2 224¢4 4 ¥ Soe eee EES 228 
20.4 Exercises ..... te thi che ee he Se Se Se Se ch Bg A SH a ee 238 


243 


21.3 Constru ible here aon ritedeaeeneaag e 28) 
ce Berea gk Ro ERE Oe SEES SHEE EEG ddd 259 


22 Solvability of Polynomial Equations by Radicals 263 
S21 Radicals .42.422 664 44494648 FR wR ORE EER 263 
22.3 Solvable has idioma ee ee ee se Le re ere 265 

22:5 Which es Are Solvab 

22.6 The Grand Finale .......... ete ee ee eeag ge Bol 

Soy © HOTCISOR. a. hy ee EE Ee Leb RES Shed bd eo 280 


23 Projects 287 
Za1 GyrOgeoupe: 6 oon S66 G SG SERRA ERS HH HES 287 
22.2 WKaléidmscopes 1.404244 0.54500 08 bli, dh 291 

The Axiom of Choices .. 44444452246 eee Se 296 

Some Category Theory ............ tere ee 300 

Linear bra: Change of Basis ............... 304 

Linear Algebra: Determinanté 246448 4% 66. eeeee B06 

oe aired a De ofp bie Gs in ee ee actin So ots he Dh dese 4 313: 


eSBs! one Nees ain tieas Se TRE kk kw ba ae eae ee ee 347 
oe. WhOee 22, Behe ak ek & ee S ee BRS 355 


Bibliography 365 


Index 367 


Preface 


This book arose to fill what I perceived as a hole in the available resources 
for introductory abstract algebra courses. Having taught such a course several 
times, I wanted to provide an experience which did not assume that abstract 
algebra was a bitter pill to be watered down as much as possible, and at the 
other extreme did not assume that everyone in the class had already decided 
to become a mathematics professor themselves. Instead I wanted to bring 
students with me on a shared adventure of mutual awe-inspiring discovery, 
hoping to instill the desire to understand and discover more for themselves: 
to appreciate the subject for its own sake. For abstract algebra is a beautiful, 
profound, and useful subject which is part of the shared language of many 
areas both within and outside of mathematics. 

This text assumes a background in high-school algebra, with some 
trigonometry and analytic geometry (e.g., Cartesian coordinates in the plane 
and the ability to calculate the distance between two points). Some familiarity 
with the properties of the ordinary integers, including divisibility, prime num- 
bers, and congruences, would be helpful. In addition, a first course in calculus 
would be useful in understanding the material presented in Chapter 16 and 
beyond. Finally, some experience with mathematical reasoning would be ben- 
eficial, as this text takes a fairly rigorous approach to its subject, and expects 
the student to understand and create proofs as well as examples throughout. 
Some of these topics are presented briefly in Chapter 1, mainly for reference, 
although a reader with limited background eager to delve into abstract alge- 
bra may be able to use this chapter as a starting point. For the reader desiring 
a more thorough treatment of sets, functions, and proofs at an introductory 
level, the first 12 chapters of [8] are recommended (please note that numbers 
within square brackets refer to entries in the Bibliography; for example, [8] 
refers to Book of Proof by Hammack). 

The book follows a single story arc, starting from humble beginnings with 
arithmetic and high-school algebra, gradually introducing abstract structures 
and concepts, and culminating with Niels Henrik Abel and Evariste Galois’ 
great achievement in understanding how we can—and cannot—represent the 
roots of polynomials. Of course, the presentation benefits hugely from the 
many discoveries, generalizations, refinements, and simplifications that have 
occurred in the two centuries since Abel and Galois did their original work; and 
everything has been filtered through the author’s own preferences and points 
of view. We will learn a lot along the way! The mathematically experienced 
reader may recognize my bias toward commutative algebra and fondness for 


xi 


xii Preface 


number theory. I have tried to begin at the beginning, and to motivate each 
new topic with a specific question, until the subject has built up a momentum 
of its own. This approach has led to a fairly tight presentation, so I recommend 
proceeding in sequence through the chapters. Likewise, the exercises at the 
end of each chapter have been designed to support and extend the material in 
the preceding chapter, as well as prepare for the succeeding chapters. These 
exercises, with very few exceptions (and these have been noted), should be 
solvable by anyone who has completed the preceding material in the text. 
The entire text can be covered in two or three 15-week semesters, depend- 

ing on the number of instructional contact hours and the thoroughness of the 
in-class presentation. I found that I needed three semesters with four hours of 
meeting time each week to cover Chapters 2—22 completely, broken down as 
follows: 

Semester 1 | Chapters 2-11 

Semester 2 | Chapters 12-16 

Semester 3 | Chapters 17-22 


At the least, Chapters 2-11 should provide sufficient content for a one- 
semester abstract algebra requirement, with coverage of the standard intro- 
ductory topics in group theory and a glimpse of ring theory at the beginning 
(in the form of a number game) and end (when general rings are introduced). 
The two-semester sequence ends with the Fundamental Theorem of Galois 
Theory, which is proved for all finite Galois extensions in any characteristic. 
The student looking to undertake more advanced study should of course com- 
plete the entire book, including all of the exercises (and the projects in the 
final chapter). Some professional mathematicians may raise their eyebrows 
at my decision to include free groups, commutative diagrams, and universal 
properties so early in an introductory text (in Chapters 6, 7, and 17, respec- 
tively), as traditionally these topics have had to wait until graduate school. 
In fact, over the years, my undergraduate students have confirmed my feeling 
that this approach, far from presenting a technical barrier, is actually the most 
natural and easy to grasp. 

The final chapter consists of a collection of projects, in which the reader 
is asked to supply nearly all of the proofs (in a sequence of “tasks”). These 
projects include stand-alone topics of specialized interest, such as kaleido- 
scopes and perfect numbers, as well as explorations of more standard topics 
such as Euclidean domains and power series, and a series of connected projects 
in linear algebra. All of the projects are designed to be accessible to the student 
who has completed the appropriate parts of the preceding chapters. 

Abstract algebra is indeed a deep subject. It may require a year or more 
after a first course in the subject to fully absorb the material and perform an 
internal synthesis (I write this from personal experience). With this in mind, 
abstract algebra can transform not only the way one thinks about mathemat- 
ics, but the way that one thinks—period. 


Author 


Steven J. Rosenberg is a professor in the Mathematics and Computer Sci- 
ence Department at the University of Wisconsin-Superior. He received his 
PhD from Ohio State University. As an educator, Dr. Rosenberg has both 
developed and taught a wide array of courses in mathematics and computer 
science. As a researcher, he has published results in the areas of algebraic 
number theory, cryptographic protocols, and combinatorial designs, among 
others. As a software developer, his clients included Coca-Cola Enterprises 
and the pension agency of Cook County Illinois. He has extensive experience 
in computer science and software engineering. 


Acknowledgments 


The author wishes to thank his former students: Aaron Anders, Josh Bentley, 
Ryan Bruner, Pavle Bulatovic, Armel Fetue, Joe Florestano, Dakota Jacobs, 
Nikola Kuzmanovski, Michael LaValley, Jounglag Lim, Zach Reiswig, Diwash 
Shrestha, Jeremy Syrjanen, and Hao Xu for providing corrections and sugges- 
tions in early drafts of this book; Chelsea Kowitz for giving me her notes from 
my lectures, on which most of the first ten chapters are based; Tom Lenosky, 
for suggesting that I might enjoy taking an abstract algebra class before I had 
declared a math major; and Herman and Minerva Katz for their unconditional 
support. 


xiii 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


Symbols 


Symbol Description Page 
€ is an element of 1 
is is a subset of 1 
Cc is a proper subset of 1 
[S| cardinality (size) of a set S$ 1 
|A| determinant of a square matrix A 45, 100 
|x| order of a group element x 77 
Z the set of all integers 1 
Zt the set of all positive integers 1 
N the set of all non-negative integers 2 
Q the set of all rational numbers 2 
R the set of all real numbers 2 
Cc the set of all complex numbers 2 
gcd(a, b) greatest common divisor of a and b 2, 235 
gcd(A) greatest common divisor of a set A 235 
Icm(a, 6) least common multiple of a and b 2 
¢ is not an element of 2 
g is not a subset of 2 
) the empty set 2 
: such that 3 
| such that; divides 3; 114 
U union 4 
al intersection 4 
= set. difference 4 
x Cartesian product; componentwise product 4; 188 
Ss” n*® Cartesian power of a set S 4 
f:D-C | f isa function from D to C 5 
f(a) output value of a function f at input « 6 
> elementwise mapping 6 
idg identity function on a set S 6 
° composition of functions 6 
f(S) image of a set S under a function f 6 
f <8) pre-image of a set S under a function f i 
a inverse function 7 
< 00 is finite 7 
fls restriction of a function f to a set S 7 


XV 


xvi Symbols 
Symbol Description Page 
Vv for all 8 
J there exists 8 

=> implies 8 

=> if and only if 8 

(fiier indexed collection 9, 195 

‘= or =: equals by definition 11 

e (or €g) identity element of a group (named G) 23, 38 

0 identity element of an abelian group 25 

Sym(S) symmetric group on a set S$ 26 

Sn symmetric group on {1,2,...,n} 27 

act inverse of group element a 30 

a” n*® power of a in a group 31 

—a inverse of a in an abelian group 33 

n-a@ n*> power of a in an abelian group 33 

! factorial 35 

< is a subgroup, subring, or subspace of 38, 103, 128 
or partial order on a set 296 

< is a proper subgroup, subring, or subspace of 38, 103, 128 

£ is not a subgroup of 38 

(S) subgroup generated by a set S 41 
subspace spanned by S$ 128 

(S1,---,;Sn) | subgroup generated by elements $1,..., Sn 41 

Py, regular n-sided polygon 48 

Doan dihedral group of rigid symmetries of P,, 49 

= is congruent to 54, 253 

Fr(S) free group on a set S 56 

ker(c) kernel of o 60, 104, 137 

In natural logarithm function 61 

J is a normal subgroup of 64 

zZ9 left coset of S 65 

Sz right coset of S 65 

A/B quotient object A mod B 67, 106 

Zn quotient group Z mod nZ under addition 68 

Ne(A) normalizer of a subgroup H in G 71 

U disjoint union 73 

[A: B] index of Bin A 76, 163 

& is isomorphic to 82, 137 

o> embeds in 87 

Aut(A) group of automorphisms of A 88, 103 

Inn(G) group of inner automorphisms of a group G 88 

Z(G) center of a group G 88 

((S)) normal subgroup generated by a set S$ 92 

G abelianization of a group G 94 

lorlr identity element (of R) under multiplication 99 


Symbols xvii 
Symbol Description Page 
M2(R) ring of all 2 x 2 matrices over R 100 
GL, (F) general linear group of n x n matrices over F' 100, 139 
C(R) center of a ring R 111 
(S) ideal generated by a set S 116 
($1,--+58n) ideal generated by elements 51,..., Sn 116 

or n-cycle in a symmetric group 213 
dim r(V) dimension of a vector space V over the field F135 
Minn(F) ring of all m x n matrices over the field F’ 138 
M,,(F) ring of all n x n matrices over the field F 139 
Riz] polynomial ring in x with coefficients from R 141 
deg(f) degree of a polynomial f 144 
deg,,(f) degree of a polynomial f in the variable v 238 
degios(f) total degree of a polynomial f 284 
f(a) result of evaluating a polynomial f at a 145 
Ea polynomial evaluation-at-a map 145 
Rial image of evaluation-at-a map, R adjoin a 146 
R{x1,..-,2%,] multivariate polynomial ring over R 152, 238 
Irr(a, Fx) irreducible polynomial of a over F’ in x 155 
F[pi,.--,;Pn|] F adjoin the elements p1,..., Pn 160 
Aut(K/F) group of automorphisms of K over F 170 
: derivative of a polynomial f 173 
char(R) characteristic of a ring R 175 
Gal(K/F) Galois group of K over F 178 
Ke fixed field of a group H in the field kK 179 
3! there exists a unique 185 
Les: repeated product 189 
® direct sum 192 
to) Euler’s totient function 194 
A-B setwise product or ideal product 197, 233 
A+B setwise sum (in an abelian group) 197 
Meg Gi repeated setwise sum 198 
Gy maximal p-subgroup of an abelian group G 200 
kG set of all k*® powers in an abelian group G 202 
Zp additive group of integers modulo p 203 
F, field of integers modulo p 203 
|| exactly divides 204 
exponent(G) exponent of a group G 209 
Orbe (a) orbit of x under the action of a group G 212 
Stabe (a) stabilizer of z under the action of a group Gs. 212 
R[U~*} localization of a domain R at U 225 
k(R) field of fractions of a domain R 228 
F(x) field of rational functions over F' in « 228 
>> 0 is sufficiently large 230 
XP, (a) exponent of a prime ideal p in a 234 


Xvili Symbols 


Symbol Description Page 
cont(f) content of a polynomial f 236 
Cs the primitive n** root of unity e?7/" in C 250 
®,, (x) n‘* cyclotomic polynomial 251 
K/F norm map from Kk down to F 260 
det(M) determinant of the square matrix 4 308 
On(F) orthogonal group of n x n matrices over F 324 
SOn(F) special orthogonal group of n x n matrices 324 
over F’ 
Annr(M) annihilator of the R-module M 359 


Tor p(M) torsion subset of the R-module M 361 


1 


Review of Sets, Functions, and Proofs 


1.1 Sets 


Of all the things studied in mathematics, the most fundamental is the set. 
Although the notion of sets as the basis for thinking about math was only in- 
troduced relatively recently (for mathematics!) by Georg Cantor in the 1870s, 
sets now occupy a central position at the heart of mathematics. 

A set is to be thought of as a collection of things, called the elements of 
the set. Elements of a set can be any type of thing: numbers, functions (to 
be defined later), or even other sets. Two sets are considered equal when they 
have precisely the same elements. We use the ordinary equality symbol to 
indicate that two sets are equal, as in “S = T.” 


Notation 1.1. Let S be a set. The statement 
zZeS, 


read “a is in S” or “x in $,” means that x is an element of the set S. The 
statement 
TCS, 


read “T is a subset of S$,” means that each element of T is also an element of 
S. The statement 
TCS, 


read “T is a proper subset of S,” means that T is a subset of S and T 4 S. 


Definition 1.2. The cardinality of a set is the number of distinct elements 
in the set. We denote the cardinality of the set S' by |S|. We also speak of 
cardinality as size. 


Definition 1.3. A singleton set is a set of cardinality 1. 


1.1.1 Some Special Sets of Numbers 


We have special notation for certain of the most important sets of numbers: 
Z is the set of all integers, or whole numbers; we use the notation Zt to 
denote the set of all positive integers; 


DOT: 10.1201/9781003252139-1 1 


2 Review of Sets, Functions, and Proofs 


N is the set of all natural numbers, or non-negative integers (note that we 
include 0 as an element of N, but some other authors do not); 

Q is the set of all rational numbers, i-e., numbers which are ratios of two 
integers, where the denominator is not zero; 

R is the set of all real numbers; and finally, 

C is the set of all complex numbers. 

As the first and most basic set of numbers in our list above, we remind the 
reader of some of the properties of Z. Given two integers a and b, we say that 
b divides a, written b | a, if there is some integer c such that a = b-c, where - 
is ordinary multiplication. 

The greatest common divisor of a and b, written gcd(a, b), is the largest 
integer d such that both d| a and d | 0; the gcd exists if a £0 or b #0. 

The least common multiple of a and b, written lem(a,b), is the smallest 
positive integer m such that both a | m and 6 | m; the lcm exists if a 4 0 and 
b#0. 

An integer p is called prime if p > 2, and there is no integer a such that 
1<a<panda| p. Two positive integers a and 0 are called relatively prime 
if we have gcd(a, b) = 1. 


1.1.2 Describing a Set 


There are four common styles used to describe a set: 

(1) We can list all the elements of the set inside of “curly braces,” { and }. 
This style is used for small sets for which it is convenient (or at least possible!) 
to actually write down (symbols for) all the elements. We refer to this style 
as explicit description. 


Example 1.4. Let S = {0,7,14, 21, 28,35, 42,49}. Then S is a set consisting of 
8 distinct elements, so we have |.S'| = 8. We can describe S in words by saying 
that S' is the set of all integers between 0 and 50, inclusive, which are integer 
multiples of 7. It is correct to write 21 € S and 42 € S, for example. Since 3 
is not an element of S, we indicate this fact by writing 3 ¢ S. It is correct to 
write {0, 21,42} C S and also to write {0, 21,42} C S. However, since {0} ¢ S$ 
(all of the elements of S are numbers, not sets), the set {{0}} is not a subset 
of S, which we can indicate by writing {{0}} Z S. 


Example 1.5. It is possible to have a set with no elements at all. There is only 
one such set (check the definition of when two sets are considered to be equal 
to each other, given above). This set is called the empty set, and denoted 0. 
It is correct to write @ = {}, which says that the empty set has no elements. 
However, note that @ 4 {@}. We consider the empty set to be a thing itself 
(as opposed to nothing!), and so the set {0} does have an element, namely 
0. Thus, 0 € {0} but 0 ¢ 0, so the two sets @ and {Q} are not equal. Note, 
however, that @ C @. In fact, the empty set is a subset of every set. 


(2) We may list several elements of the set, inside of curly braces, until 
a pattern is obvious, and then write the ellipses symbol “...” (three dots), 


Sets 3 


followed by the final elements of the set (if the set is finite). In case the 
pattern also continues in the opposite direction, we write ellipses before the 
first explicitly listed elements as well. The reader should beware of possible 
ambiguity in using style (2): use this style sparingly if it all, and only use it 
when the pattern of set elements is quite clear. 

Example 1.6. Let U = {0,7,14,...,49}. Then we have U = S, where S is the 
set from Example 1.4. At least, this is the most obvious interpretation of what 
happens in the omitted section of our description of U; there could possibly 
be other interpretations! 


Example 1.7. We have 
N = {0,1,2,3,...} 


and 
Z = {...,—2,—-1,0,1,2,...}. 


Notice that the number of elements we choose to write explicitly is up to us; 
we just want to be sure that the pattern of elements is obvious. 


(3) We can write a set in the implicit style 
{zES : Ch 


or 


{x € S| Ch, 


where: 

x is a simple variable name (we may use z itself, or any other letter or 
symbol not already used for another purpose); 

S is some already-known set; and 

C is some condition involving the variable x. 

In this context, “:” and “|” are each read “such that.” The entire expression 
is read “the set of all 2 in S such that C” or “the set of all x in S' satisfying 
CY” 

We note that the variable used inside the set description (called x here) 
is a so-called dummy variable: this means that it has no meaning outside of 
the curly braces describing this set; it is only a place-holder which is to be 
thought of as running through all elements of the given set S. 


Example 1.8. We can describe the set of all natural numbers N as follows: 
N={neZ:n>0}. 
Example 1.9. We can describe the set S in Example 1.4 as 
S={neEN : 7|nandn< 50}. 


Here, we have the compound condition “7 | n and n < 50.” We remind the 
reader that “7 | n” is read “7 divides n,” which means that there is some 
integer m such that n = 7-mz; in other words, 7 is a factor of n, or n is a 


4 Review of Sets, Functions, and Proofs 


multiple of 7 (by some integer). Notice that in this case, the symbol “|” means 
“divides,” and not “such that.” In general, we prefer to use the colon “:” for 
“such that,” since the vertical bar, “|,” has so many other uses (however, the 
colon is also used to mean something else, in function notation; see below). 


ala 


(4) We can describe a set as the collection of all things of a certain form: 
all “formulas” whose variables come from certain specified sets, and which 
satisfy certain given conditions: 


S = {(formula) : (conditions) }. 
Example 1.10. We can write the set Q of all rational numbers as 
Q= {a/b : abe Zand b FO}. 


Example 1.11. Notice that style (3) can be replaced by style (4). For example, 
we can write the set S from Example 1.4 using style (4) as 


S={n: néEN,7|n, and n< 50}. 


1.1.3. Operations on Sets 


There are several commonly used methods to produce a set from two or more 
given sets. 
The union of two sets A and B is 


AUB={«x: ce Aorzre B}. 
The intersection is defined by 
ANB={«: «€ Aand ce Bh. 
The difference (or set difference) is defined by 
A-B={«x:a¢€Aandz¢ Bh. 
The Cartesian product of two sets A and B is 
Ax B={(a,b) : a€ Aand bE B}. 


Here, we call (a, 6) an ordered pair whose first component is a and whose second 
component is b. When we take the Cartesian product of a set with itself, we 
may use exponential notation: thus, A? is simply shorthand for A x A. 


Example 1.12. Two sets S and T are called disjoint if they have no elements 
in common. With set operations, we can write this condition as SN T = @. 


Example 1.13. A point P in the ordinary Cartesian plane can be written using 
two coordinates x and y as P = (a,y), where x and y are real numbers. The 
set S' of all points in the Cartesian plane is thus 


S={(zr,y) : 2, yEeR}. 


Functions 5 


But now we can realize that this set is merely the Cartesian product of R 
with itself: 
Sa R: 

So we can say that the Cartesian plane is R?. Extending this idea, we can 
write the set of all points in ordinary three-dimensional space as R°. 
Remark 1.14. In the notation R°, we can ask whether we mean A = (R x 
R) x R or instead B = Rx (Rx R). Formally, these are two different sets; an 
element of A looks like ((#,y), z), while an element of B looks like (a, (y, z)). 
However, in such situations we will agree to omit the inner parentheses and 
treat A and B as equal, so A= B=R°= {(z,y,z) : 2,y,2€R}. 

The operations of union and intersection may be applied to more than two 
sets at a time; we define the union and intersection of any collection of sets, 
as follows: 


Definition 1.15. Let € be a collection of sets (that is, a set whose elements 
are sets). Then 


U S=({ax : there exists S € € such that « € S} 
See 


and 
() S={e : forall Se €,x€ S}. 
SEC 


1.2 Functions 


Informally, a function is a rule which tells us how to process an input of a 
specified kind to produce an output. More formally, a function f from a set 
D to a set C is a way to unambiguously produce an element of C' from any 
given element of D. We write 


f: DC 


to indicate that f is a function which accepts inputs which are elements of the 
set D and produces outputs which are elements of the set C’. Here, D is called 
the domain of f and C is called the codomain of f. We read “f : D—C” 
as “f is a function from D to C” or “f from D to C” 

Even more formally, we can group each input and the corresponding output 
of a function f together in a single ordered pair, and consider f itself to be 
the set of all such ordered pairs: 


Definition 1.16. A function f with domain D and codomain C is a set 
foCdDxc 


such that for every x € D, there is a unique y € C for which (2, y) € f. 


6 Review of Sets, Functions, and Proofs 


Notation 1.17. Let f be a function. To write f(x) = y means that (x,y) € f. 
We may also write f : «+> y for f(x) = y. 
Remark 1.18. The condition that for every x € D there is a unique y € C’ 
for which (2,y) € f simply says that for every possible input x, there is a 
unique output y determined by x using the function f; i.e, a unique y such 
that f(x) = y. 
Remark 1.19. In the definition of a function given above, the codomain C' is 
the least essential part. In fact, while the domain D of f is uniquely determined 
by f, the codomain is not. We are always free to replace C by any set which 
contains (at minimum) all the output values of our function. We note, however, 
that in some parts of mathematics it is convenient to include the codomain 
as part of the information about a function, i.e., to define our function to be 
the ordered pair (f,C) instead of just f. 
Remark 1.20. Notice that, as promised, sets are the most basic objects around. 
Even a function was defined to be a special type of set. Actually, the formal 
definition of a function (as given above) says that a function f is exactly what 
we are used to calling the graph of f: namely, the set of all points (x, y) where 
f(x) =y. 

If f : DC isa function, then we say that f maps D to C, or that f 
is a mapping from D to C. Similarly, we read f : r+ yas “f maps x to y.” 

For each set, there is a distinguished function on that set which “does 
nothing”; or more accurately, which maps every element of the domain to 
itself, thus revealing the identity of each element: 


Definition 1.21. Let S bea set. The identity function on S is the function 
ids : SoS 
defined by the formula 


for every x in S. 


In case the codomain of one function is the domain of another function, 
it makes sense to perform two mappings of an input from the first function’s 
domain: 


Definition 1.22. Suppose that f :A— Bandg : B- C are functions. 
Then the composition of g with f is the function go f : A — C defined by 
the formula (go f)(x) = g(f(x)) for every x € A. 


If f(x) = y, then we say that y is the image of x (under f). We also define 
images and pre-images of sets: 


Definition 1.23. Let f : D— C be a function. For any set A C D, we 
define the image of A under f to be 


f(A) ={fla) : ee ACO. 


Functions 6 


For any set B C C, we define the pre-image of B under f to be 
f\(B)={reD : f(x) e BCD. 
We distinguish three properties which a function may possess. 


Definition 1.24. Let f : D—C bea function. 

We say that f is one-to-one, or injective, if no two elements of D map to the 
same point of C’. More formally, f is injective means that for all 71,22 € D, 
if a1 A x then f(a1) 4 f(x). 

We say that f is onto, or surjective, if every element of C' is the image of 
some element of D. More formally, f is surjective if for every y € C, there is 
some x € D such that f(x) = y. 

We say that f is bijective, or that f is a bijection, if f is both injective and 
surjective. 


Remark 1.25. Two sets have the same cardinality precisely if there is a bijec- 
tion from one to the other. In fact, this may be taken as a definition of what 
it means for two sets to have the same cardinality. 

Remark 1.26. A set S is called finite if there is a natural number n such that 
there exists a bijection from S to {1,2,...,n}. (Notice that this latter set is 
empty if n = 0.) To indicate that S is finite, we sometimes write |S| < co or 
|S] =n<o. 

Remark 1.27. If f : D— Cis a bijection, then there is a unique function 
g :C—> Dsuch that fog =idp and go f = idg. We call g the inverse of f, 
and we write g = f~!. This may cause some confusion with the notation for 
pre-images (introduced above). Even if f is a bijection, however, the notation 
is still unambiguous; just remember that we only take pre-images of subsets 
of the codomain, whereas the inverse function takes elements of the codomain 
as inputs. Also, in case B C C and f is bijective, then the set f~!(B) is the 
same whether we interpret it as the pre-image of B under f or as the image 
of B under f~!. 

If f : D— Cis a function and S C D, then f lets us compute a value 
in C' for any given input from S. By limiting ourselves to input values from 
S, we get another function, which is technically different from f, because its 
domain is S instead of D: 


Definition 1.28. Let f : D — C be a function, and let S C D. The 
restriction of f to S is the function 


h: SHC 
defined by the formula 
h(x) = f(x) for all xin S. 


In this situation we may also say that f extends h, especially if we began with 
h and found f afterwards. 


Notation 1.29. The restriction of f to S is denoted f|s, read “f restricted to 
De” 


8 Review of Sets, Functions, and Proofs 


1.3. Proofs 


A proof is a chain of reasoning which uses definitions, logic, and previously 
established results in order to establish a new result. Ideally, every step in a 
proof should clearly follow from the previous steps, and should be justifiable 
as just indicated above. 

One of the central skills necessary to succeed in higher mathematics is the 
ability to translate back and forth between informal and formal language. The 
informal language (usually a “natural” language, such as English) is valuable 
because it expresses our ideas in an easy-to-read and easy-to-absorb form, 
and also lets us express ideas when we are still not ready to express ourselves 
precisely. A more formal language (such as the symbolic language of logic) 
is very important because it lets us express our ideas very precisely, without 
the possibility of misinterpretation or ambiguity. The reader should keep in 
mind that when doing math, both points of view are useful; try to understand 
each new concept and each statement at both the informal level—the level 
of intuitive meaning—and the formal level, where it can be expressed most 
precisely. Much of doing math consists of “unpacking” formal statements to 
discover their meaning, then forming a strategy, and finally implementing the 
strategy back on the formal side of mathematics. 


1.3.1 Logic 


Here we only recall the four most important symbols from logic which have 
not yet been introduced in this text: 

“Y” is called the universal quantifier, and is read “for all” or “for every.” 

“3” is called the existential quantifier, and is read “there exists.” 

“ = > ” is called the implication or conditional symbol, and is read 
“implies.” 

“ <=> ” is called the biconditional symbol, and is read “if and only if,” 
sometimes abbreviated “iff.” 

In an informal context, or outside of formal set descriptions, we also use 
“s.t.” to stand for “such that.” 


Example 1.30. We can write the definitions of injectivity and surjectivity for- 
mally, using the language of symbolic logic. Let f : D— C be a function. 
Then f is injective iff 


Va1,%2 €D, 11 # £2 f(a1) 4 f (x2). 


The definition of injectivity is often more convenient to work with in the 
logically equivalent form 


Vx1, 22 Ee D, f(x1) = f (x2) => %1= 72. 


Proofs 9 


We can say that f is surjective iff 


Vy eC, dae Ds.t. f(x) =y. 


Example 1.31. We can formally state the definition of when two sets S and T 
are equal, as follows: 


S=T <= Va,(reS = cet). 


Usually we break the right-hand statement down into two parts: Vz € S,x ET 
and Vx € T,x € S. Note in particular that a set only “knows” whether or not 
a given object x is an element of it; being an element of a set is a yes-or-no 
property. There is no way to say that x occurs 42 times in the set S, for 
example. Likewise, elements of a set have no ordering by themselves; a set is 
an unordered collection of elements. If we want order, or repetition, then we 
need to use other constructions. One way to get these additional features is to 
use a Cartesian product; for example, in an ordered pair (a,b) € $7, we can 
say that a is the first component and 6 is the second component, and we may 
have a = b. Another option is given in the following example. 


Example 1.32. Often, it is convenient use elements of some set J as labels or 
subscripts. Formally, let f : I — S be a function. Then we sometimes use 
the notation f; instead of f(i), and (fi)ier instead of f. In this situation, we 
call I the index set, and an element of J is called an index. The function f 
itself is called an indexed collection, and we may also refer to f as the indexing 
function. 

A familiar special case occurs when J = Zt and S=R. Leta : ZR 
be a function. For every positive integer i, we have a; € R. We usually refer to 
(ai)icz+ aS a sequence of real numbers. Notice that there may be repetitions 
among the a,;, and there is no way to recover the number of such repetitions 
or the order of the sequence from the set {a; : i € Zt} of the a; values alone. 

As another example, consider the set € of all closed intervals in R of length 
2. For example, we have [4,6] € €, and [7,7+2] € €. If we know the left-hand 
endpoint of such an interval, then that interval is uniquely determined; this 
left endpoint is a real number. Therefore, it makes sense to use R. as an index 
set here. So for each x € R we let 


C, =([x,0+2)={yEeR : a<y<xr+2}, 
and then observe that 
c€={C, : cE R}. 


In this example, the indexing function C is one-to-one, and we may be content 
to use the subscript notation C, without the rest of the notation for an indexed 
collection, since we only care about the resulting set €. 


In the present text, we shall not delve deeply into the twin foundational 
subjects of set theory and mathematical logic; there are many books devoted 


10 Review of Sets, Functions, and Proofs 


to each of these subjects, which the interested reader may examine. But the 
reader should be aware that these subjects exist, and that they form the basis 
for what we may accept as “obvious” facts. For example, the substitution 
principle, which states that we may replace any expression with another so 
long as the two expressions are equal, belongs to mathematical logic. We 
make free use of the substitution principle. On the other hand, the historically 
controversial Axiom of Choice, which is equivalent to the statement that any 
non-empty collection of non-empty sets has a non-empty (Cartesian) product, 
turns out to have some very non-obvious consequences; we use this axiom less 
frequently, and make some effort to call attention to it when it is needed. 


1.3.2 Proof Conventions 
1. To prove a universal statement 
Va € A, P(x) 
start by choosing an arbitrary element of A: 
Let x € A. 


Your goal is now to prove P(x). 

Common mistakes: choosing x to have special properties instead of 
being arbitrary. Also, choosing x to be a compound expression (as 
opposed to a single variable symbol), as in “Let AN E € F” 


2. To prove an existential statement 


WwW 


xe As.t. P(x) 


find a specific element x of A with the property P(z). 


Common mistakes: leaving some details of x unspecified (as 


variables). 
3. To prove an implication 
P=>Q 
start by assuming P: 
Suppose P. 


Your goal is now to prove Q. 


Sometimes it is convenient to prove the implication P => Q by 
proving the logically equivalent statement (not Q@) = > (not P), 
which is called the contrapositive of the original implication. 


4. To prove a biconditional statement 
P=@Q 


break the proof into two sub-proofs: (1) Prove P => Q, (2) Prove 
Q => P. 


Proofs 


11 
To prove a subset relationship 
ACB 
convert this statement to the logically equivalent form 
VaE Ace B 


and prove this statement (using Proof Convention (1)). 


To prove an equality of sets 
A=B 


break the proof into two sub-proofs: (1) Prove A C B, (2) Prove 
BCA. 


Always introduce a new variable (say x) by writing either 
Let r=... 


or 


dre Ast... 


If you use the second method, then you need to justify the existence 
of x somehow. Sometimes when we introduce a new variable, we 
write “x :=...” or “... =: ©”; here, := and =: both mean “is equal 
to by definition.” Notice that the colon : always goes next to the 


new variable. 


Common mistakes: using a variable before it is properly introduced; 
also, using a dummy variable outside its proper context, as in, “Vx € 
A, P(x). So « satisfies ...” Here, x is a dummy variable that only 
makes sense inside the universal statement, not outside it in the 
second sentence. Notice that, according to our rule, x has not been 
introduced as a variable! 


We briefly recall the main proof techniques here: 


1. 


Direct Proof: In this technique, we start with what we are given 
(our hypotheses) and proceed directly to our goal. 


Proof by Contradiction: To prove a statement P by contradiction, 
we start by writing 


Assume for a contradiction that P is false. 


We proceed until we derive a statement which is clearly false, and 
conclude by writing 


This contradiction shows that P is true. 


12 Review of Sets, Functions, and Proofs 


3. (a) Proof by Induction: This proof technique applies to statements 
of the form 
Yn EN, P(n), 


where P(n) is a statement which depends on the natural number n 
(note: any well-ordered set may be substituted for N here). To use 
induction, we must prove 

(i) P(O) is true, and 

(ii) Vn EN, (P(n) = P(n+1)). 

We first write 

Base Case: n = 0. 

and proceed to prove that P(0) is true. Next we write 


Inductive Step: Suppose that P(n) is true. (We call P(n) the in- 
ductive hypothesis.) 


Our goal is now to prove that P(n + 1) is also true. 


(b) Proof by Strong Induction: As in ordinary induction, strong 
induction may be used to prove statements of the form 


Yn EN, P(n). 


But in strong induction, our inductive hypothesis is (apparently) 
stronger: namely, P(O) and P(1) and --- and P(n). Thus the in- 
ductive step is (sometimes) easier when strong induction is used. 


It is often useful to perform calculations and do specific examples in the 
course of attempting to prove a general result. Similarly, it is a good idea to 
keep track of one’s current goal in a proof, and of the strategy to be used. 
This “scratchwork” seldom appears in the final, clean version of a proof which 
is printed in books. However, in order to give the reader an idea of how 
and why a proof flows the way it does, we have taken the license to include 
scratchwork inside of some of the proofs in this text. Such work is enclosed in 
square brackets, [ and ], so it will not be confused with the proof itself. 


1.4 How to Read This Book 


Like any mathematics text, this book should be read slowly and carefully, with 
plenty of paper at hand to take notes, and especially to work out and verify 
the claims made in the text. One very good habit to form is to pronounce 
everything as you read it (even if this takes place in your head): as you learn 
each new symbol and new notation, make sure to also learn how to pronounce 
it, and practice each time you see it. The alternative leads to skipping ahead 


How to Read This Book 13 


every time you encounter math notation, which is clearly not a good idea! 
And this is only the first step, of course. Make sure to work out all the details 
of each argument of a proof, of each new definition, and of each discussion, 
until you are satisfied with it. Take ownership of the math as you go. 

Reading a mathematics text for understanding is a slow and sometimes 
difficult process. Math is denser than many subjects—just as gold is denser 
than air. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


2 


Introduction: A Number Game 


2.1 A Game with Integers 


Imagine a game that is played as follows. To set up the game, you gather a 
collection of scraps of paper—an infinite collection—and label each scrap with 
an integer. We require that each integer appears on one and only one scrap 
of paper. Then, while you aren’t looking, your friend puts each scrap into its 
own separate bag. 

There is no way for you to see inside the bags, and the bags look identical 
to each other from the outside; they are indistinguishable. But you are allowed 
to perform addition and multiplication on the bags: you may hand over (or 
otherwise point out) any two bags to your friend, and ask for the sum or 
product. Your friend will look inside the bags for you, add or multiply the 
enclosed numbers as requested, and retrieve for you the bag which contains 
the result. You may even ask for the sum or product of a bag with itself. 

For example, we may hand over the bag containing 7 and the bag contain- 
ing 4, and ask for their sum. Our friend will peek inside these bags, search for 
the bag containing 11—this may take a while!—and hand it over to us. But 
our friend never tells us the values of any of the numbers inside. 

The goal of the game is to identify the contents of the bags. (More precisely, 
we would like to find a procedure which eventually identifies the contents of 
any given bag.) 

Is there any strategy you can use to win this game? The answer is yes! 
To see this, think about numbers which have a special relationship to the 
operations of addition and multiplication. The leading candidates are 0 and 1. 

Zero is the only integer which has no effect when added to another integer. 
If you can just find some pair of bags Z and X such that 


Z+X=xX, 


then you will know that bag Z contains the number 0. By systematically 
searching for such a pair, you will eventually be able to find and label the bag 
Z containing 0. 

After discovering 0, you can take advantage of the properties of the number 
1 under multiplication. If you can find two bags U and Y such that 


U-¥ =Y, 


DOT: 10.1201/9781003252139-2 15 


16 Introduction: A Number Game 


making sure that bag Y is not Z, then bag U must contain 1. Again, after 
enough attempts, you will be able to identify the bag U. 

One final search is needed to find the bag containing —1. One way to do 
this is to look for a bag N such that 


N+U=Z. 


At this point, no more trial-and-error is required. You can add U to itself 
to obtain 2’s bag, and keep adding U to identify the bags of all the positive 
integers. Similarly, starting with N, you will be able to take care of all negative 
integers. Thus we can discover the contents of all the bags! 

For reference, let us speak of the number game we have just finished dis- 
cussing as the number game on Z (recall that Z denotes the set of all integers). 


2.2 <A Bigger Game 


Now that we know how to win the number game on Z, let’s make things more 
interesting by throwing in the irrational number 2. The number v2 gets its 
own scrap of paper (and its own bag), and you, the player, are allowed to add 
or multiply bags as before. 

But there is a complication here: we cannot simply enlarge our collection 
of integer bags with one additional bag containing 2. For your friend would 
not be able to respond if you asked for the sum, say, of the bags containing 
\/2 and 1. What is the smallest collection of numbers that we need to include 
before we can play our new and improved game? 

First, observe that any number of the form b/2 , where b € Z, needs to be 
included. Adding such numbers to ordinary integers, we see that any number 
of the form _ 

a+bV/2 


(where a is an integer) also needs to be included. The unhappy prospect 
looms of having to repeat this procedure of multiplying, adding, multiplying, 
and adding without end. But remarkably, we can stop at this stage. For it is 
not hard to check (Exercise 2.2) that the sum or product of any two numbers 
of this form is again of the same form. Thus, we replace the set Z of our first 
game with the set 

R:={a+bv2|a,beZ} (2.1) 


to play our new game. We put each number from R in its own bag, and again 
the goal is to determine the contents of the bags. 

The same reasoning as before will show that you can identify each ordinary 
integer within its own bag. For, even in this bigger set R, the numbers 0 and 1 
still enjoy their unique properties with respect to addition and multiplication. 


A Bigger Game 17 


Now if you could only identify the bag containing 2, you would be able to 
finish and win the game (see Exercise 2.3). 

One way you might think of to identify \/2 is to use the fact that if bag T 
contains 2 and bag X contains V2, then 


XoX ST. (2.2) 


Since T can be found (2 is an integer!), we just need to look for such an 
XxX. 

But there is a problem: the number —\/2 has the same property! The two 
numbers /2 and —\/2 are the only elements of R which satisfy equation 2.2, 
so your task boils down to distinguishing between these two numbers. How 
can you tell apart /2 from —/2? At this point, the curious reader is invited 
to ponder this question before reading on. 

It turns out that there is a fundamental problem in distinguishing /2 from 
—y2: it is actually impossible to identify them, no matter how clever you are 
or how much time you have! 

To convince ourselves of this, consider the following thought experiment. 
Suppose that you start playing the game one day, and at the end of the day, 
after having identified and labeled some of the bags, you go home. During 
the night, unbeknownst to you, your friend does some mischief: your friend 
switches the contents of the bags, replacing the scrap of paper labeled /2 
with that labeled —/2, 1+ V2 with 1 — V2, and in general, m+ nv2 with 
m — nv/2 for arbitrary integers m and n. 

The next day, you resume play, and start by checking your work of the 
previous day. One calculation at a time, you repeat the computations of yes- 
terday’s work. Suppose that, on the first day, bag A contained a+ a2 and 
bag B contained b+ 6/2. Asking for their sum, you would be handed the 
bag C containing (a + b) + (a + 8)V2. On the second day, what will happen 
when you recalculate this sum? Since bag A will then contain a—av/2 and B 
will contain b — 3/2, when you compute A+ B, you will be handed the bag 
containing (a + b) — (a+ 8)\/2. But this number is now inside bag C, so you 
will get the same result as on the first day. 

Even more remarkably, computing the product of A and B gives the same 
result on the second day as on the first day: on the first day, the result is the 
bag D containing 


(a+avV2) - (b+ BV2) = (ab + 208) + (a8 + ab)V2, 
while on the second day the result is the bag EF containing 
(a — avV2) - (b— BV2) = (ab + 208) — (a8 + ab) V2. 


But the bags D and E are the same! 

Thus, the results of all computations we could possibly do with our bags 
will be the same on both days. If there were a way you could be sure that a 
certain bag X contained \/2 on the first day, then the exact same calculations 


18 Introduction: A Number Game 


that led you to that conclusion, but performed on the second day, would lead 
you to conclude that this same bag X contains V2 again on day two. But on 
day two, X would contain —/2, and not 2. This contradiction shows that 
there can be no procedure which allows you do decide which bag contains V2. 


2.3. Concluding Remarks 


Our first number game, using the set of integers Z, is winnable. We interpret 
this result as telling us that Z is completely determined by how its elements 
behave under the two operations of addition and multiplication. 

The second game, using the set R, is not winnable; this tells us that R is 
not completely determined by its two operations of + and -. The key idea in 
establishing this fact was to consider switching m+ nV/2 and m—nv/2. More 
formally, we can consider the function 


f:ROR 


given by 


f(m+nv2) =m—nv2 (2.3) 


for m,n € Z. The reader may recognize the formula above from high school 
algebra: m — nv/2 is sometimes called the conjugate of m + nv/2, and it is 
useful, for example, to rationalize denominators in fractions of the form 


a 
m+ nv 2 

What makes the function f so useful in analyzing our game is that you cannot 
tell the difference between the bags before and after applying f; there is no 
way to tell that a switch was made. We can express this in a different way 
by saying that f is a symmetry of R. Algebraically speaking, f has a certain 
relationship (to be explored in the exercises) with the operations of addition 
and multiplication, and this relationship gives f its amazing usefulness. 

Stepping back a bit further, these number games may start to give us a 
new perspective on number systems such as Z and R. At the start of a game, 
all the numbers are hidden inside of bags: we have removed any identifying 
information about the numbers. At this point in the game, there would be 
no way to tell apart two different number systems unless their sizes were 
different. Expressed another way, at the start of a game, we have stripped 
away all features of our number system except for the one feature it possesses 
as a mere set: namely, its size. 

When play commences, we can take advantage of the additional features 
of the number system: namely, its operations. Using known properties of these 
operations, such as the existence of the additive identity element 0 and the 


Exercises 19 


multiplicative identity 1, and the rules for addition and multiplication, we can 
determine much about the hidden contents of the bags. In the case of Z, we 
can determine the contents completely. 

In traditional “high school” algebra, we start with a given number system 
such as Z or R and study its properties with respect to the usual operations 
of addition, subtraction, multiplication, etc. We then use these properties to 
solve equations and inequalities containing one (or more) unknown number. In 
abstract algebra, we do not start with a particular system such as the integers 
or the real numbers. Instead, we start with an unknown set and an unknown 
operation, and we specify the properties of the operation. (Later, we may 
allow two operations, or even more general extra structures on our set.) We 
proceed to determine the consequences of these properties; a typical question 
is, to what degree of uniqueness do these properties determine our set and 
operation? 

This more abstract approach to algebra allows us to generalize the notion 
of a number system and to isolate those properties that are most important, 
together with the consequences of those properties. We can thus expand our 
vocabulary beyond that of number systems. Abstract algebra also sheds light 
on number systems themselves. Indeed, the power of abstract algebra is so 
great that it provides elegant, unifying proofs of previously known results 
and allows us to solve problems which can be formulated in terms of classical 
algebra and geometry, but which had defied solution for centuries. These are 
the hallmarks of a successful mathematical theory. 


2.4 Exercises 
Exercise 2.1. Describe a way of identifying the bag containing —1 which is 
different from the method used in the text. 


Exercise 2.2. Let R be as in Equation 2.1. Prove that for all x and y in R, we 
have both a+y € Rand «-y€ R. (We say that R is closed under addition 
and multiplication.) 


Exercise 2.3. Let R be as in Equation 2.1. In the number game for R, explain 
how you could identify the contents of all the bags if you could just determine 
which bag contained V2. 


Exercise 2.4. A student wishes to consider playing the number game on the 
set of integers with the rational number 4 thrown in. Imitating the example 
in the text, this student defines the set of numbers 


S:= {uta | abnezh 


(a) Suppose that the student attempts to define a function f : S > S by 


20 Introduction: A Number Game 


the formula 


b b 
f(a =) =a a for a,b,n in Z. 


What is wrong with this definition of f? 

(b) Find a description of S which is simpler than the one given at the 
beginning of this exercise. Prove that your description is correct. 

(c) Prove that S is closed under addition and multiplication: that is, do 
Exercise 2.2 with S in place of R. (It will likely be convenient to use the 
description of S' from part (b).) 

(d) Is the number game on S' winnable? Justify your answer. 


Exercise 2.5. Let R be as in Equation 2.1. 

(a) Prove that for all x in R, there are unique integers a and b such that 
z=at+bv2. 

(b) Use part (a) of this exercise to explain why the definition of the function 
f given in Equation 2.3 makes sense. (We say that f is well-defined here.) 
Compare to Exercise 2.4 part (a). 


Exercise 2.6. Consider playing the number game on some number system 
V. Here, V could be any set of numbers which is closed under addition and 
multiplication. Suppose that g: V — V is a bijective function. 

(a) Suppose that whenever someone replaces x by g(x) inside the bags, for 
all x in V, then there is no way for the player to tell the difference. Show that 
we must then have 

g(x + y) = g(x) + g(y) (2.4) 


and 
g(x -y) = g(x) - g(y) (2.5) 


for all 2 and y in V. 

(b) Verify that the function f on R defined by Equation 2.3 satisfies both 
Equations 2.4 and 2.5 from part (a) of this exercise. 

(c) Prove that there are exactly two functions on R which satisfy both 
equations, namely, the function f defined by equation 2.3 and the identity 
function on R. 


Groups 


3.1 Introduction 


No one would dispute the usefulness of number systems. Our first task in 
abstract algebra is to generalize the concept of a number system by isolating 
their most important properties. We then declare that any system which has 
a certain subset of these specific properties is to be called a group or ring or 
field, etc., according to which subset of the properties it satisfies. 


3.2. Binary Operations 


From an algebraic point of view, what constitutes a “number system” such as 
Z is not just the set Z, but, even more importantly, the operations addition (+) 
and multiplication (-). Let us focus on addition. What kind of mathematical 
object is +? 

To start with, + (on Z) is something that takes two integers as input and 
produces a single integer as output. This suggests that + may be a function. 
If so, what can its domain be? A single element of the domain of + must 
somehow provide us with two arbitrary integers. 

We recall that elementary set theory has a device to capture two elements 
of a set in a single entity: namely, the Cartesian product. An element of Z x Z 
is an ordered pair (a,b) of integers. Thus we want to say that + is actually a 
function 

+:2Z2xZ—>Z. 


In fact, this description of + is entirely accurate if we are only willing to allow 
the usual function notation 
+( (a, 6) ) 


to be written in the alternative format 


a+b. 


DOI: 10.1201/9781003252139-3 21 


22 Groups 
Thus, for example, the equation 
34+4=7 


can be rewritten as 
+( (3,4) ) = 7. 


We say that + is a binary operation on Z since it requires two inputs from 
Z. Next we generalize this discussion by defining what we mean by a “binary 
operation” on an arbitrary set. 


Definition 3.1. Let S be a set. A binary operation on S is a function 
A:SxS—S. 


Remark 3.2. In the definition above, S can be any set. S is not necessarily a set 
of numbers. We can and will allow S' to be a set whose elements are numbers, 
or “grids” of numbers (known as matrices), or functions, for example. 


Remark 3.3. Our notation for the output values of a binary operation will 
follow the example of the familiar operations of addition and multiplication 
of numbers. Instead of the usual function notation A( (a,b) ) or its shorthand 
A(a, b), we use the “operator notation” aAb. 


Remark 3.4. We often use the symbol - (“dot”) to denote a binary operation, 
as our symbol of choice. Sometimes we refer to a- 6 as a “product,” and we 
may even refer to the operation - as “multiplication,” even though - may have 
nothing to do with ordinary multiplication of numbers. Later, we will usually 
omit a symbol altogether, and simply write ab for a-b. We often use + (“plus” 
or “additive” notation) if our operation is commutative (see Definition 3.11 
below). 


Now that we have a general notion for the kind of thing that addition 
and multiplication of numbers are (namely, binary operations), we recall the 
fundamental properties of these familiar binary operations, so that we may 
define these properties, too, in generality. The properties that we wish to 
consider are associativity, commutativity, the existence of an identity element, 
and the existence of inverses. 


Example 3.5. The associative property of Z under addition can be written as 
Va,y,z€Z, (et+y)+z2z=24+ (y+z). 


To define what it means for a general binary operation to be associative, we 
only need to replace Z and + by a general set and a general binary operation 
on that set: 


Definition 3.6. Let - be a binary operation on a set S. Then - is associative 
if 
Vu,y,z€S, (w-y)-2z=2-(y-2). 


Groups: Definition and Some Examples 23 


Notice in particular that the expressions (#-y)-z and «- (y- z) make sense: 
for «-y is an element of S, and so we can dot it with z. 
We can also define the other properties of interest to us in generality: 


Definition 3.7. An identity element for a binary operation - on a set S' (also 
called an identity element of S under -) is an element e of S such that 


VaE S,e-n=x2-e=2. 


Remark 3.8. We sometimes describe the defining property of an identity el- 
ement informally by saying that an identity element is “absorbed” by other 
elements. 


Definition 3.9. Let - be a binary operation on a set S which possesses an 
identity element e. An inverse of an element a of S (with respect to -) is an 
element 6 of S such that 


Remark 3.10. The definition above is symmetric in a and b. Thus we can say 
that b is an inverse of a iff a is an inverse of b. Also, it appears that our 
definition of inverse depends on which identity element we choose, but we 
shall see later (Lemma 3.33) that there can be at most one identity element 
for a given binary operation. 


Definition 3.11. Let - bea binary operation ona set S. Then - is commutative 
if 
Va,yeS, vx-y=y-u. 


Example 3.12. Let S = {a,b} be a set with two elements, a 4 b. To form a 
binary operation - on S, we are free to decide what each of the “products” 
a-a,a-b, b-a, and b- b should be—as long as each product is again in S. 
Some binary operations on S' will be associative, while others will not be; and 
likewise for the other properties we defined above. Here is one way to define a 
binary operation - on S: a-a = b, a-b = b, b-a = 6, and b-b = a. We observe that 
this operation is commutative, since a- b = b- a = b. However, our operation 
- is not associative, because a- (b-b) =a-a= 6b, but (a-b)-b=b-b=a. (See 
Example 3.25 for a “multiplication table” format for writing down a binary 
operation.) 


3.3. Groups: Definition and Some Examples 


Now it is time to define our first new type of object to study in depth. 


Definition 3.13. A group is a set G together with a function -, satisfying the 
following 4 axioms: 


24 Groups 


GO - is a binary operation on G. 

G1 The operation - is associative. 

G2 G contains an identity element for the operation -. 
G3 Every element of G has an inverse in G. 


Remark 3.14. By default, the symbol e will denote the identity element of a 
group (and not, for example, Euler’s constant, which is the natural logarithm 
base). Sometimes we will use the notation eg for e when we wish to indicate 
the group in question. Also, there are instances where other notation will be 
used for the identity element of a group, as dictated by circumstances and 
convention; these exceptions will be noted as they come up. 


Remark 3.15. Formally, our group is the ordered pair (G,-). We call G the 
underlying set of the group, and we refer to - as the group operation or group 
law. If the group operation is understood, we often speak of G alone as a 
“group,” even though the operation is more important than the underlying 
set G. The same set G may be a group in many different ways: G may admit 
many different group laws! 


Remark 3.16. Notice that we did not require the commutative property in 
our list of axioms for a group. A group operation need not be commutative. 
This is important, because many algebra rules which are true for numbers fail 
in non-commutative groups. In general, we shall have to re-learn the rules of 
algebra from scratch for each new type of object we study, starting now with 
groups. 


Definition 3.17. A group is called commutative or abelian (rhymes with 
“chameleon”) if its operation is commutative. 


Remark 3.18. When we say that a group is commutative, we mean that all 
pairs of elements in the group commute. We may also say that two particular 
elements commute (i.e., with each other), without requiring our group to be 
commutative. However, we never say that a pair of elements is “abelian”: 
the term abelian is only used in a global sense, to mean that the group is 
commutative. 


Generally speaking, the more properties (axioms) we require to be satisfied, 
the easier the resulting objects are to understand. In this vein, abelian groups 
are easy to work with compared to groups in general. We shall see instances 
of this fact below (see, for example, Warning 3.43 and Lemma 3.44). 


Example 3.19. We claim that (Z, +) is a group. Let us verify the group axioms: 

GO: Adding two integers gives an integer, so + is a binary operation on 
Z. 

G1: Addition on Z is associative, as we have recalled earlier. 

G2: We must find an integer e such that e+ a = 7 = x + e for every 
integer x. The choice e = 0 works, so 0 is an identity element for (Z,+). The 
reader may recall that 0 is sometimes called the additive identity. 


Groups: Definition and Some Examples 25 


G3: Let « € Z. We must show the existence of an integer y such that 
xty=0=y+4. The choice y = —2 has the desired property. Again, the 
reader may have seen —z referred to as the additive inverse of x. 


Remark 3.20. For an abelian group, the group operation will frequently be 
denoted by + (“addition”), even if our group is not an ordinary number system 
such as Z. We also frequently use the symbol 0 instead of e to denote the 
identity element in an abelian group. 


Example 3.21. The reader may verify that (R,+) is a group. The details are 
similar to those in the preceding example. 


Example 3.22. In this example, let - be ordinary multiplication on R. This op- 
eration is associative, and possesses an identity element, namely 1. Attempting 
to check Axiom G3, we need to solve the equation xy = 1 for an arbitrary 
real number x. The solution is y = 1/xz. Unfortunately, this solution does not 
make sense when x = 0. Moreover, when x = 0, the equation xy = 1 has no 
solution. Therefore, 0 has no inverse under multiplication, and so (R,-) is not 
a group. 

But let’s try to salvage the situation. Since the number 0 was the only 

problem, let us remove it! First, notice that when - is restricted to non-zero 
real numbers, we still get a binary operation (on R —{0}), because the product 
of two non-zero real numbers is non-zero. Multiplication is still associative on 
this smaller set, and 1 still acts as an identity element. Also, if x © R — {0} 
then 1/x € R — {0}. Therefore (R — {0}, -) is a group! 
Example 3.23. Consider the set S := {5 1,2} under multiplication. We know 
that multiplication of real numbers is associative. Further, 1 is an identity 
element for multiplication; and for each number in S, its multiplicative inverse 
is also in S. Yet S is not a group under multiplication. The reason is that 
multiplication is not an operation on S! For example, 2 € S, but 2-2 ¢ 9. 

To get a group from S, we need to include all the integer powers of 2. The 
reader can verify that the set {2” | n € Z} is a group under multiplication; 
see also Example 4.11. 


Example 3.24. Let S = {a} be a set with one element. Then there is only one 
possible binary operation - on S$, namely, a: a =a. This operation makes S$ 
into a group! Of course, we have e = a here. A group with only one element 
is called a trivial group. 


Example 3.25. Let G = {a,b,c}, and define a binary operation - on G by the 
following table: 


We interpret this table as follows. To compute the value of x - y, we read the 
entry in the row labeled by xz and the column labeled by y. For instance, we 


26 Groups 


have c-b = a, because the row labeled by cis the third row, the column labeled 
by b is the second column, and the entry in the third row, second column is 
a. Let us see what is involved in checking that (G,-) is a group. 

G1: Checking associativity requires 27 verifications. This is because we 
must choose 3 arbitrary elements of G, with repetition allowed, where order 
matters; for example, choose b, 6, and c. Each of these 27 verifications requires 
4 computations using the table; in our example, we must check whether 0 - 
(b-c) = (b- b)-c, and this requires 4 “dot” computations. Therefore, 108 
computations are required in all. 

G2: We claim that a is an identity element for (G,-). Checking this only 
requires 5 verifications. 

G3: The reader can verify that an inverse for a is a, an inverse for b is c, 
and an inverse for c is b. This requires at most 6 “dot” computations, with 
only 3 required if you take advantage of the symmetry in the definition of 
inverse. The 3 relevant computations are: a-a=a,b-c=a,andc:b=a. 

It turns out that - is associative, and so (G,-) is a group. But clearly we 
should not rely too heavily on brute-force computation or tables to under- 
stand group operations in general: even in this tiny example, well over 100 
computations would be needed just to verify that G is a group! 


Definition 3.26. A table whose rows and columns are labeled by the elements 
of a group G, such that the entry in row x and column y is the product x - y, 
is called a group table or Cayley table for G. 


Next we introduce an important family of groups called symmetric groups. 
First we need a preliminary definition. 


Definition 3.27. Let S be a non-empty set. A permutation of S is a bijective 
function f:S— S. 


Definition 3.28. Let S be a non-empty set. Then 
Sym(S'}) := {f | f is a permutation of S}. 
(Here, “Sym” stands for “symmetric” and is pronounced “sim.”) 


We would like to make Sym($) into a group by finding a suitable binary 
operation. Since the elements of Sym(S) are functions, we look for an operation 
that produces a function from two given functions. A natural candidate is the 
operation of composition of functions. Recall that if we have two functions 
a0:A—> Band7: BC (so that the codomain of o equals the domain of 
7), then we can form their composition toa : A — C by using the formula 
(r 0 0)(x) = T(o(ax)) for all x in A. Because of the following lemma, we call 
Sym($) the symmetric group on S. 


Lemma 3.29. If S is a non-empty set, then (Sym(S),0) is a group, where o 
is composition of functions. 


Groups: Definition and Some Examples 27 


Proof. We verify each of the group axioms. 

GO: [Check: Vo,7 € Sym(S),0 07 € Sym(S)] 

Suppose that 0,7 € Sym(S). Then 0,7 : S > S, so 007 is defined, and 
is a function from S' to S. It is not hard to show that the composition of two 
bijections is a bijection (Exercise 3.6), so we have a o7 € Sym(S), as desired. 

G1: [We must show: Vf,g,h € Sym(S),(fog)oh=fo(goh)] 

Let f,g,h € Sym(S). [Show: (fog)oh= fo(goh)| 

[Two functions are equal if they have the same values at all points of their 
common domain. So we must show: 


Let x € S. Then ((fog)oh)(x) = — 
(goh))(x) = f((goh)(x)) = f(g(h 
Therefore, o is associative. 

G2: We need to find a permutation e of S' such that 


eof=foe=f 


for every permutation f of S. In particular, for each permutation f and each 
x in S we need to have e(f(x)) = f(x). Since permutations are surjective, if 
we take any element y of S, there is some x in S such that f(x) = y. This 
forces e(y) = y for all y in S. So set e = idg, the identity function on S. One 
can verify (Exercise 3.7) that this choice of e works. 

G3: [We must show: 


Vf €Sym(S) dg € Sym(S) s.t. fog=gof=e=idgs. 


This should remind us of the properties of an inverse function. Strategy: show 
that f—! exists, is in Sym(S), and satisfies the equations for g above. | 

Let f € Sym(S). Since f is a bijection from S$ to S, then f has an inverse 
function f~! : S + S. Now f~! is also bijective (Exercise 3.8), so f~! € 
Sym(S). We have fo f-! = f-!o f = idg, so f~! is an inverse of f in the 
group sense as well as the function sense, and we are done. 


Notation 3.30. In the special case when S = {1,2,3,...,n} for some positive 
integer n, we denote the symmetric group on S by S,,. That is, 


Sp = Sym({1,2,3,...,n}). (3.1) 


Example 3.31. For an element f € S;,, let us use the array notation 


to represent f. For instance, let @ = E ; ; s| and let tT = 


28 Groups 


4 3 
that o(1) = 3, o(2) = 1, o(3) = 4, and o(4) = 2. Let a = 007, the product 
of o with 7 in Sy. Then we have a(1) = (0 0T)(1) = o(7(1)) = o (4) = 2. The 
reader should check that, in array notation, we have 


| 1. 22 ; : | Then o is the function o : {1,2,3,4} > {1,2,3,4} such 


That is, we have 


Notice that we can compute a product of two elements like this in a symmetric 
group without ever leaving array notation. For instance, here we compute 
1h 4 and 4+ 2 to get 1 + 2 in the result. Also notice that we end up 
computing this product “from right to left,” which may seem counterintuitive 
until we remember that functions are composed by applying the right-hand 
function first. 


3.4 First Results about Groups 


In this section, we start to “tame” groups by proving that some of the 
familiar rules of algebra for numbers still hold true for groups. Just as impor- 
tantly, we shall also see that some results which are true for number systems 
are not true for groups in general; in some cases, these failed results can still 
be shown to hold provided we are dealing with abelian groups. 

As usual in mathematics, we should be careful not to make unspoken as- 
sumptions or make up our own rules when dealing with the algebra of groups. 
Our default position should be that nothing is true unless we can prove it! 

Our first job will be to make sense of an expression like 


ay . ag . a3 ee eee Qn, (3.2) 


where the a,’s are elements of some group. We frequently write down such 
expressions with numbers: e.g., 


1424344. (3.3) 


In fact, our familiarity with ordinary arithmetic is perhaps so great that we 
may not see any problem here at all. What’s the big deal? 


First Results about Groups 29 


The problem is that our operations, whether + on the set of integers or - 
on a general group, are supposed to be binary operations. This means that we 
can only operate on two elements at a time. Strictly speaking, an expression 
such as (3.2) does not make sense (yet)! 

Let us see how we handle this problem in the second case, Expression (3.3), 
which is routine arithmetic. By adding parentheses to this expression, we can 
arrive at a meaningful expression. The trouble now is that we can do so in 
several different ways: e.g., 

(1+ 2) +(3+ 4) 


or 
1+((2+3) +4), 

etc. Now the point is that no matter how we “parenthesize” Expression (3.3), 

we always get the same result. The next lemma assures us that the same is 

true in any group; it is a consequence of associativity. 


Lemma 3.32. Let S be a set with an associative binary operation -. Let 
G1, 02,...,dn € S, with n > 3. Then the value of a1: a2-...-dn does not depend 
on how we compute it: it is independent of the parenthesization. 


Proof. Induct on n. (We use “strong induction.”) 


Base Case: n = 3. In this case, there are only two possible parenthesizations 
of 3 elements a1, a2, a3, namely (a; - a2) - a3 and a, - (a2- a3). These are 
equal by the hypothesis that our operation is associative. 


Inductive Step: Suppose our result is true for up to n — 1 elements. Let 
@1,-..,4n € S. Our strategy will be to show that any parenthesization of 
these elements is equal to a1 - (a2: (a3: (---(@n—1° G@n)-+-))). For later use, 
let w = ag - (ag ° (+++ (Gn_1 + Gn) +++ )). 

Fix some parenthesization of a, -a2----++@,, and write it in the form x-y (we 
can always do this; there must be a “final” dot which is to be computed last 
according to our given parenthesization). The expression x involves, say, a1, 
ag, ..-, aj, while y involves aj41, ..., Qn, with 1<i<n. 

First we suppose 7 > 2. By inductive hypothesis, we can rewrite x in the 
form a,- z for an expression z involving a2 through a;; and we can also say 
that the expressions z-y and w must be equal, since both expressions involve 
the n — 1 elements az through a, in that order. We get 


ay =(a,-z)-y 
= a1: (z-y) 
=a; -(w) 
Ce er Cee eee 


The case i = 1 is similar, but we do not need a “z” in that case. Thus all 
parenthesizations of these n elements of S have a common value. 


30 Groups 


One practical consequence of Lemma 3.32 is that we can omit parentheses 
when writing a product of elements in a group. We take the value of an 
expression like (3.2) to be the value of any particular parenthesization of that 
expression. 

The next two results deal with questions of uniqueness. In math, we often 
want to prove that there is a unique object with certain properties. A standard 
technique to prove uniqueness is to suppose there are two such objects and 
show they are equal. 


Lemma 3.33. If- is a binary operation on a set S, then there is at most one 
identity element of S under -. In particular, there is only one identity element 
in any group. 


Proof. Suppose that - is a binary operation on a set S. Suppose that e, f € S 
and that both e and f are identity elements of S. [Show e = f] Since e is an 
identity and f € S, we have e- f = f-e = f. But f is also an identity and 
e € S, so we have f-e=e- f =e. Thus e= f. 


Lemma 3.34. In a group, inverses are unique. More precisely, if G is a group, 
then for alla € G, there is a unique b€ G such that a-b=e; and for this b, 
we also have b-a=e. 


Proof. Let G be a group and let a € G. By Axiom G3, 4b € Gs.t.a-b= 
b-a = e. Now suppose that c € Gand a-c = e. [Show c= b] Thena-b=a-c, 
so b- (a-b) =b-(a-c), (b-a)-b=(b-a)-c,e-b=e-c, and b=c. 


Now we know that the inverse of a group element is unique, so we in- 
troduce notation for it. We use exponential notation, which is suggested by 
the examples both of ordinary multiplication (Example 3.22) and of functions 
(Lemma 3.29). 


Notation 3.35 (Inverse Notation). If a is an element in a group, then a~ 
denotes the inverse of a. 


1 


Remark 3.36. The definition of inverse (Definition 3.9) together with our new 
notation says that a-a~! 1.a=e. To say “b=a~!” meansb-a=a-:b=e. 
By Lemma 3.34, if a and b satisfy a-b = e then we must have b = a~ and 
a=b!. 

Warning 3.37. We do not use fraction notation to denote the inverse of a 
group element: that is, we will not denote a~! by 4, For one thing, “1” is 


=a 


not in general an element of our group! But even if we allowed B to stand for 
b-a—', we would find the familiar rules of fractions violated in general; for 
example, b- © # ©-b unless b and a + happen to commute. Therefore, we put 
off using fraction notation for now. 


Next we will expand our notation to encompass any integer power of a 
group element. Just as with ordinary exponential notation for numbers, this 
notation is a shorthand for repeated “multiplication”—only now, the operation 
of “multiplication” is an arbitrary group law. 


First Results about Groups 3l 


Definition 3.38. Let G be a group, let a € G, and let n be a positive integer. 
Then 


Q" :=@:GQ-...°4, 
n factors 
a" :=a@!-a7}....-a7}, 
n factors 
a® [= €. 


Remark 3.39. Now that we have both multiplicative and exponential notation, 
it is natural to ask whether an expression such as a:b? means a-(b?) or instead 
(a- b)?. We follow the same order of operations as in traditional algebra, so 
powers have higher precedence than multiplication, and parentheses have the 
highest precedence of all. Thus a- 6? means a- (b?). Later, when we study 
objects with two operations, both + and -, we will continue to abide by the 
standard order of operations. 


Some of the rules for ordinary exponentiation of numbers are still true in 
arbitrary groups. Before reaching our next theorem, which states two such 
rules, we prove a lemma. 


Lemma 3.40. Let a be an element of a group, and let n € Z. Then (a”)~1 = 


a”, 


Proof. Case 1: n > 0. 
By Remark 3.36, it is enough to verify that a”-a~” = e. When n = 1, this 
is immediate from the definition of a~!. For n > 1, we have 


a -ao" =a-a:--a-at-a!---q 7! 
— ee. eo” 
n factors n factors 
= @a2:a:::a ca-aot-at.-q!.--q 7} 
——’” -eowsS ee 
n — 1 factors n — 1 factors 
= a-a-:-a-e-a'-q'.---q! 
—_—_———" -oaeoOoOoOo"” 
n — 1 factors n — 1 factors 
=a-a---a-ai-q!.---q!. 
S00 oOo" 
n — 1 factors n — 1 factors 


We can continue to “cancel” the middle two terms until we reach the identity 
element, e, and we are done. More properly, our argument gives a proof by 
induction on n. 

Case 2:n <0. 

Since —n > 0, Case 1 above gives (a~")~! = a~—”) = a”. By Remark 
3.36, we conclude that (a”)~! = a~”, as desired. 

Case 3: n = 0. 

Then a” = a® =e and a~” = a =e, by Definition 3.38. But e-e = e by 
definition of identity, so we have e~! = e, and we are done. 


32 Groups 


Theorem 3.41 (Laws of Exponents). Let G be a group, a€ G, andm,n€ Z. 
Then 


G) aa" =a" and 


Proof. Case 1: m,n > 0. 
Then for (i) we have 


OO SA Ot tO Orie RO ae na alert 
=_—_ OO Ole eoo~e eS — 
m factors n factors m-tn factors 
For (ii), we have 
(ayy =a” a” a =a-a+ Gea” 
Me+C—VeT TOT 
n factors of a™ mn factors 


Case 2: m,n < 0. Left to the reader in exercise 3.12. 

Case 3: m > 0, n <0. 

[Idea: a®-a~? = a3*+?-.q~? = a3-a?-a~? = a?-e = a. But this is justifiable 
based on our current knowledge only when m+n > 0. Oneraes we model 
our proof on the following calculation: a? -a~° = a?-a~?-3 = a?-a7?-a7-3 = 


For (i), we first assume that m+n > 0 to get 


a™.qr™ = qintrn—n gr 
= a™t”.q-™.q” (by Case 1, since m+n > 0 and —n > 0) 
= art” -.e (by Lemma 3.40) 
= gr (by definition of identity). 


Next we prove (i) assuming that m+n <0: 


—m+m+n 


Ce ara 
= a™.q~-™.qa™*" (by Case 2, since —m < 0 and m+n <0) 
= pug (by Lemma 3.40) 
=f gene (by definition of identity). 

If m+n=0, then m = —n, so a™-a" =a~"- a" =e =a® =a™” 


This completes the proof of (i) in Case 3. 

Now, for (ii): Let b = a™. By Lemma 3.40, we have b” = (b~”)~!. On the 
other hand, we have (™” = (a™)—" = a~™” by Case 1 above, since m > 0 
and-=f: > 0. So (@™)*= 6? = (6-™) = (a -") 1 Sa" vas desired: 

Case 4: m < 0,n > 0; Case 5: m = 0; Case 6: n = 0: Left to the reader in 
exercise 3.12. 


Warning 3.42. We do not attempt to define a” when a is a group element and 
r€éZ. 


First Results about Groups 33 


Warning 3.43. In general for group elements a and b, we may have (a-b)” 4 
a” - 6”. Equality does, however, hold in an abelian group: 


Lemma 3.44. Let G be an abelian group and let a,b € G. Then for alln € Z, 


we have 
(a+b) =a” - 0b". (3.4) 


Proof. We prove the statement for n > 0, by induction on n. The case n < 0 
is left to the reader as Exercise 3.13. 
Base Case: n = 0. In this case, Equation 3.4 is true since e-e =e. 
Inductive Step: Suppose that n > 0 and that Equation 3.4 holds. [We must 
show: (a:b)?! = a?t1. prt} | 


Then 
(a-b)"t! = (a-b)"-(a-b) (by Theorem 3.41) 
= a”-b"-a-b (by inductive hypothesis) 
= a”-a-b"-b (since G is abelian and b”,a € G) 
= get pyrt (by Theorem 3.41). 


Notation 3.45 (Additive Notation). In an abelian group whose operation is 
written as +, we use a different notation for denoting inverses and exponents. 
Namely, we denote the inverse of a by —a instead of a~!; and we denote 
repeated addition using multiplicative notation, writing n-a instead of a”. 
This “additive notation” for exponentiation thus conforms to our usage in 
ordinary number systems, and is especially important when we have a set 
with two different operations (for instance, both + and -). 
We present one more result below which can help us solve equations in 
groups. 
Lemma 3.46 (Cancellation Laws). Let G be a group and let a,b,c € G. Then: 
(i) Ifa-c=b-c, thena =}; and 
(ti) Ifc-a=c-b, thna=b. 
Proof. (i): Suppose that a-c=b-c. Then 
Gea. = 
Gece) = 


The proof of (ii) is similar. 


Remark 3.47. The proof of the preceding lemma is probably more important 
than the result itself. Notice that it matters very much that c is on the right 
of the dot on both sides of equation (i), and likewise c is on the left of the 
dot on both sides of (ii). We recommend that the reader avoid thinking of 
“cancelling,” but instead remember to apply the inverse to both sides of an 
equation, and to do so on the same side of the operation. 


34 Groups 


We conclude this chapter with some examples showing how to apply our 
knowledge of the algebra of groups. From now on, we shall make free use of the 
properties of inverses of group elements, of the laws of exponents, and of the 
absorption property of the identity element. We shall also find it convenient 
to omit the operation symbol (e.g. -) sometimes; it should be understood to 
be present. 


Example 3.48. Suppose that x, y, and z are elements of a group. Solve the 
following equation for x: 


Solution: 
zay2 = 22y> 
golegey2 = gh. 22y5 
xy” => zy? 
“yey? = zy -y? 
£ = zy. 


Example 3.49. Suppose that x, y, and z are elements of a group. Solve the 
following equation for x: 
zay? = y?z*. 


Bad Solution: 


dey? a yPzbl 
cy? = yPz 
ay? = yP3z 
it) ge 
Good Solution: 
zay2 = yz? 
gle gay? = go). yz? 
aye = zatybz? 
aye yo? = za ty3z? yo? 
a = galybe2y-? 


We cannot simplify this last expression (unless for example our group is 
abelian). Notice how careless use of “cancelling” led to an incorrect solution 
in our first attempt. We cannot cancel from the left and from the right at the 
same time! 


3.5 Exercises 


Exercise 3.1. Let S = {a,b} with a F b. 
(a) Write out the “multiplication tables” for all possible binary operations 


Exercises 35 


on S. How many are there? Can you give a general formula for the number of 
binary operations on a finite set as a function of the size of the set? 

(b) Find at least one binary operation on S which possesses an identity 
element, and one which does not. 

(c) Find at least one binary operation on S' which is associative, and one 
which is not. 

(d) Find at least one binary operation on S which is commutative, and 
one which is not. 

(e) Find at least one binary operation on S which is a group law, and one 
which is not. 


Exercise 3.2. Let G be a group with exactly 2 elements (we say that G has 
order 2). Thus we may write G = {e,x} with x # e, where, as usual, e is the 
identity element of G. 

(a) Can 2~! = e? Why or why not? 

(b) What must 2~! be? 

(c) Write out the group table for G. 

(d) Prove that every group of order 2 is abelian. 


Exercise 3.3. Prove that if e is the identity element of a group, then e” = e 
for all n in Z. Suggestion: first use induction to prove the result for n > 0. 


Exercise 3.4. Let S = {| oe 
c d 

a b uov|_ | aut+tbw av+bs 

c d [% eal. aaah 

(a) Prove that - is associative. 

(b) Find an element e € S such that m-e=e-m=m forallmeS. 

(c) Is - commutative? 


: a,b,c,d € ri. Define a binary operation 


-on 5 by | 


Exercise 3.5. Let G be a group and let a,b € G. Suppose that b-a = a7! - b. 
Prove by induction on k that for all k € N, we have b-a* = a~*- b. Then 
extend this formula to all k € Z. 


Exercise 3.6. Prove that ifo : A ~ Bandt: B > C are two bijective 
functions, then the composition Toa : A — C is also bijective. 


Exercise 3.7. Prove that for any non-empty set $, the identity function on S$ 
is an identity element of Sym(S) under composition of functions. 


Exercise 3.8. Prove that the inverse function of a bijection is also a bijection. 


Ezercise 3.9. Let G = $3 be the symmetric group on {1, 2,3}. 

(a) Write out all of the elements of G using array notation (see Example 
3.31). You should have 6 elements in all. 

(b) Show that, in general, we have |S,,| = n! for any positive integer n. 
Recall that n! := 1-2-3-----(m~—1)-n and is called the factorial of n. 

(c) Write the group table for $3. You may find it convenient to label each 
element with a single letter. 


36 Groups 


Exercise 3.10. Suppose that a,b,c,d € S and - is a binary operation on S. 
Consider the expression a- b- c- d. 

(a) Find all possible parenthesizations of this expression. How many are 
there? 

(b) For each parenthesization you found in part (i) above, write down the 
expressions x and y as described in the proof of Lemma 3.32. 

(c) Assuming that - is associative, show directly that all parenthesizations 
of this expression have the same value. 


Exercise 3.11. Decide, with proof, whether each of the following is a group: 


1. Z under ordinary multiplication; 


2. R with the operation - given by 7- y= 


z+y . 
14 a2y? ) 
3. R? with the operation © given by (a,b) @ (c,d) = (a+c¢,b+d); 


4. The set of all irrational real numbers under ordinary multipli- 
cation; 


5. The set V = {a €R : —c <x < cc} with the operation © given 
by rOy= ree: where c is a fixed positive real number (this is 
the velocity addition law in special relativity theory, when c is the 
speed of light and the velocities are parallel). 


Exercise 3.12. Finish the proof of Theorem 3.41 in Cases 2, 4, 5, and 6. 


Exercise 3.13. Prove that if a and b are elements of an abelian group, then 
for all positive integers n, we have (a-b)-" =a~"-b-”. (Use induction on n.) 


Exercise 3.14. Generalize the result of Lemma 3.44 as follows. Prove that 
if a,,...,@, are elements of an abelian group, and n € Z, then we have 
(a, +++ a)" =ap---an. 

Exercise 3.15. Let G be a group and let 1,22 € G. Find a formula for 
(x122)~! by solving for z in the equation x;%2 - z = e. Then generalize 
your result by finding a formula for (712%2---x,)~+ for arbitrary elements 
Y1,.--,0, EG. 


Exercise 3.16. We have seen in Example 3.22 how to shrink the set R to get a 
group under multiplication. In Example 3.23, on the other hand, we enlarged 
the underlying set that we started with in order to produce a group. Is it 
possible to enlarge R to get a group under multiplication, by adding a single 
new element and suitably extending the definition of multiplication? Prove 
your claim. 


Exercise 3.17. In Example 3.25 we did not stipulate that a, b, and c are dis- 
tinct. If, perversely, we assume that a = b and that the table for the operation 
- is still valid, then show that we must in fact have a = b = c, so |G| = 1. 
Thus (G,-) is still a group! 

Ezercise 3.18. Suppose that (S,-) is a group, where S C R and - is ordinary 
multiplication. Can 0 € S? Prove your claim. 


A 


Subgroups 


4.1 Groups Inside Groups 


In this chapter, we begin to explore the relationships that groups can have 
with each other. In particular, we examine the situation in which one group 
can “live inside” another group. In addition to shedding light on relationships 
among different groups, we will find a process to actively seek groups inside 
of a given group. 

We have seen that both (Z,+) and (R,+) are groups. They seem related. 
How exactly are they related to each other? Well, Z C R, and + on Z is 
“inherited” from + on R. Let’s formalize and generalize this idea. What needs 
to be clarified is the “inheritance” concept. 

Recall that if f : A > B is a function, then we can restrict f to a subset 
of its domain. That is, given any subset C' of A, we can consider the function 


g:C>B 


given by the rule 

g(x) = f(a) for all x in C. 
We use the notation g = f|c, read “g is the restriction of f to C.” We also 
say in this situation that f is an extension of g to A. The point is that f and 
g produce the same output on any input that is in the intersection of their 
domains. 

To make the relationship between (Z,+) and (R,+) clear, we should first 
find a way to distinguish between the two “+” operations, for they technically 
have different domains. Let us use the notation +z and +R to distinguish 
these operations. Thus, we have 


4+z3:Z2xZZ 


is integer addition, and 
+a: RxR—->R 
is real-number addition. These two operations are related by restriction: 
namely, +z = +n| . This discussion naturally leads to the following 
ZxXZ 


definition. 


DOT: 10.1201/9781003252139-4 37 


38 Subgroups 


Definition 4.1. Let (G,-) be a group. A subgroup of (G,-) is a group (H, A) 
such that H C Gand A =-|y7xH. 


Notation 4.2. We write (H, A) < (G,-), or simply H < G, to mean H isa 
subgroup of G. The notation H < G means that H is a proper subgroup of 
G: that is, H < Gand H #G. We write H < G to mean H is not a subgroup 
of G. 


Example 4.3. (Z,+z) < (R,+r). Indeed, this was our motivating example. 
On the other hand, (R — {0}, -) < (R, +), because multiplication and addition 
do not always give the same result when applied to non-zero real numbers; for 
example, 1+1A41-1. 

To understand a given group G, one of the first questions we shall ask is, 
what are its subgroups? Given a group (G,-) and a subset H of G, there is at 
most one binary operation A on H that makes (H, A) a subgroup of (G,-): 
namely, A = -|47x#. Because of this, it makes sense to simply write H < G to 
mean (H,:|#x#) < (G,-) when the operation on G is understood, and there 
is seldom a need to use a separate symbol for the group law on H. A natural 
question is, for which subsets H of G is H < G? One way to answer this 
question will be presented shortly, in Theorem 4.8. 

Let us consider Axiom (G2). If H < G, then each group, H and G, must 
contain an identity element, which we can denote ey and eg, respectively. 
The next result says that the identity elements of a group and a subgroup 
must coincide. 


Lemma 4.4. Let G be a group and suppose H < G. Then eg = ex. 


Proof. Since ey is an identity element of H, we have ey - ey = ey. Also, 
since eg is an identity element of G and ey € G, we have ey -eg = ey. SO 
en -eH = ey: eq. By the Cancellation Laws for G (Lemma 3.46), we have 
€H = &G. 


As a consequence of the previous lemma, whenever we consider a group 
together with some of its subgroups, we can simply use e to denote their 
shared identity element. Another convenience is that the notion of “inverse” 
coincides for a group and a subgroup: 


Corollary 4.5. Suppose that G is a group, H < G, and ax ©€ H. Then the 
inverse of x considered as an element of the group H is equal to the inverse 
of x considered as an element of the group G. 


Proof. Suppose that H < G and x € H. Let a and b denote the inverse of x 
as an element of H and of G, respectively. Then a- 1 =e =b-2,soa=b by 
Cancellation in G. 


Because of Corollary 4.5, it is safe to use the notation x—! for the inverse 
of x even when there are multiple groups under consideration, so long as they 
are subgroups of a common “parent” group. 


Groups Inside Groups 39 


Certainly, the empty set cannot be a group, since it has no identity element; 
indeed, it has no elements at all. Two not so trivial things could prevent 
A <G. First, the operation of G may not restrict to a binary operation on 
HT. For example, let H = 5 1,2} C R — {0}. Then ordinary multiplication 
doesn’t restrict well to H; see Example 3.23. The condition that is wanted 
here comes up so often that it has a name: 


Definition 4.6. Let (G,-) be a group, and let H C G. Then H is closed under 
if 
Va,y€ AH, «-ye dH. 


Second, H may not contain all the inverses of its own elements: for instance, 
N CZand1€N, but —1 ¢N, so N is not a subgroup of Z under addition. 
We next record the property that is missing here: 


Definition 4.7. Let G be a group and let H C G. Then H is closed under 
inverses if 
Vee H, «te dH. 


The following theorem assures us that these are the only things that can 
go wrong: 


Theorem 4.8 (Subgroup Test). Let G be a group and let H C G. Then 
H<G iff all three of the following conditions are satisfied: 


STO AZ is non-empty; 
ST1 4 is closed under inverses; 
ST2 H is closed under the group operation of G. 


Proof. (=) Suppose H < G. [We must show STO, ST1, and ST2] 

STO: Since H is a subgroup of G, H contains the identity element e of G 
by Lemma 4.4, so H must be non-empty. 

ST1: Let « € H. By Axiom (G3) for H, there is an inverse y of x in H. 
By Corollary 4.5, we have y = x~!, the inverse of x in G. 

ST2: Let x,y € H. Since (H,-|#x#) is a group, then the restriction of - 
to H x H is a binary operation on H: 


laxa:Hx H > H. 


Thus we have «-y € H. 

(<) Suppose that STO, ST1, and ST2 are satisfied. 

[We must show H < G, which means (H,-|H# x #7) is a group] 

Let A = -|yx#. We proceed to verify the group axioms for (H, A). 

GO: We must show that the range of A is a subset of H. This is exactly 
the condition ST2. 


40 Subgroups 
G1: Let x,y,z € H. Then 


(tAy)Az = (a-y)-z (since A is the restriction of -) 
= «-(y-z) (by Axiom G1 for G) 
= zA(yAz) (since A is the restriction of -). 


G2: By STO, there is some element x in H. We have x~! € H (by ST1), 
and so «- a! € H (by ST2). Thus eg € H. We verify that eg is an identity 
element for A. Let a € H. Then aAeg = a: eg =a and egAa = eg:a=a. 
This establishes Axiom G2. 

G3: Let x € H. Then by ST1, we have 2-1 € H. Now rAgc7!=a2-271= 
eg and «~'!Ag = x«~!-a = eg. From the paragraph above, eg is an identity 
element for A, so we are done. 


Remark 4.9. When we use the Subgroup Test in the “backwards” direction, 
to prove H < G, we usually establish STO by showing e € H. Lemma 4.4 
assures us that this strategy is a good one. 


Remark 4.10. Again, when used to prove H < G, the Subgroup Test never 
requires us to verify Axiom G1. This is fortunate, as associativity is in some 
sense the toughest of the group axioms to check: see Example 3.25. Notice how, 
in the proof of the Subgroup Test, associativity for A followed automatically 
from associativity for -. 


Example 4.11. We will use the Subgroup Test to verify that the set 
AT := {2”|ne Z} 


is a subgroup of G := R — {0} under ordinary multiplication. 

STO: We have 2° € H, so H is non-empty. 

ST1: Let « € H. Then x = 2” for some n in Z. In the group G, the inverse 
of xis 2~-”. But —n € Z, so 2-” € _ H.. 

ST2: Let 2,y € H. Then « = 2” and y = 2” for some m,n € Z. So 
xvey=2™*” and 2" € A since m+ne Z. 


4.2 The Subgroup Generated by a Set 


We have seen (e.g. in Example 3.23) that not every subset of a group is a 
subgroup; and now we have a test, the Subgroup Test, that lets us check 
whether a given subset is a subgroup. Next we ask, Given a group G and a 
subset S C G, what is the smallest subgroup H of G which contains S? 
Questions such as this are often fruitful in mathematics, but they do not 
always have answers! As a simple example, we could ask, What is the smallest 
positive real number? Of course, there isn’t one. In the case of our question 
about subgroups, however, we will find that there is always an answer. 


The Subgroup Generated by a Set Al 


Suppose for the moment that we know some elements s1,82,...,5, of S 
(not assumed to be distinct). Since H is required to contain 5, we certainly 
must have s; € H for each i. Since H < G, the Subgroup Test forces s, € A 
for each 7, since H must be closed under inverses by ST1. We can summarize 
what we now know by stating that sf € H for each 7 in {1,...,4} and each ¢ 
in {1,—1}. 

Next, using the closure of H under the group operation (ST2), we can say 
that the product of any two such terms must lie in H. Using closure under - 
repeatedly, we conclude that s{'s5?---s{* € H for any choice of €1,€2,...,€% 
in {1,—1}. 

The next result says that we need seek no further: 


Theorem 4.12. Let G be a group and let S CG with S non-empty. Let 
H = {851897 +++ 85" | 81,...,8% € S,€1,...,en € {1,-1}}. 


Then H <G. Further, H is the smallest subgroup of G which contains S, in 
the following sense: if K is any subgroup of G which contains S, then H C K. 


Remark 4.13. The elements s1, ..., 5, do not need to be distinct! There may 
be repetitions in this list. 


Proof. We use the Subgroup Test. 

STO: Since S' is non-empty, 4s € S. So s € Af (taking k = 1, s; = s, and 
€, = 1), and Z is non-empty. (This argument also shows that S C H.) 

ST1: Let « € H. [Show x~! € H ] Then we can write x = s{1s5?--- s;* 
with s; € S and e; € {1,—1} (by definition of H). Now the inverse of x is 
given by the formula 


at = (s;*)~1---(85?)~"(s{1) "+ (by Exercise 3.15) 
= 6g, °F s+ 8578)" (by Theorem 3.41) 


Since e; € {1,—1}, we also have —e; € {1,1}, and so x7? is of the right form 
to belong to H: namely, a product of elements of S raised to powers of +1. 

ST2: Let x,y € H. [Show xy € H] Then we can write x = s{1s5?--- s)* 
and y = t{1t3?---t;* with s;,t; € S and e;,a; € {1,—1}. So we have 


2 gE1 ZED Ek 4Q14Q2Q ae 
LY = 87°85" ++ 8" by tg? + EE, 


which is again of the right form to be in H. 

The last statement in the Theorem follows from the fact that a subgroup 
is closed under inverses and the group operation, as outlined in the argument 
immediately preceding this proof. 


Notation 4.14. The group H defined in the statement of Theorem 4.12 is 
denoted (S'), and called the group generated by S. If S = {s1,...,8n} isa 
finite set, then we also use the notation (s1,...,5,) instead of ({51,...,5n}) 
for the group generated by S. In case S = 0), we interpret (S) to mean (e), 
which is the smallest subgroup of all (see Exercise 4.1). 


42 Subgroups 


Example 4.15. The simplest application of Theorem 4.12 occurs when S con- 
tains just one element. Let G be a group and let a € G. Set S = {a}. Then 
we have 


(S) {841857 -+- sp" | si € S,e, © {1,-1}} (by def. of (S() 


= {a®a&---ae |e; € {1,—-1}} (since S = {a}) 
= fqgertear ree | eye it, —1}} (by Laws of Exponents) 
= fa" |neZ}, 


this last description coming from the fact that an arbitrary finite sum €, + 
€g+---+e, of 1’s and —1’s can give any integer n as a result. According to 
Theorem 4.12, this set is a subgroup of G; according to our notational rules, 
we may also denote it by (a), which we generally prefer. 


The case of a group generated by a single element is so important that it 
has its own terminology: 


Definition 4.16. Let H be a group. If H = (a) for some a € H, then H is 
called cyclic, and we say that a is a generator for H. 


Warning 4.17. In Definition 4.16, we need a € H, as opposed to a C H. In the 
notation (x), it is important to know whether z represents a set or an element. 
In both cases, (x) will be a group. When z is a single element of a group, then 
(x) is a cyclic group; but if x is a set of size greater than 1, then (x) will in 
general not be cyclic. To help avoid confusion, we try to use lower-case letters 
to stand for elements and upper-case letters to stand for sets. 


Notice that in Definition 4.16 we did not say that H was a subgroup 
of some other group. Even though we began this section with the notion of 
constructing subgroups within a given group, we have arrived at an “intrinsic” 
or “absolute” (not relative) notion of what it means to be cyclic. 

Nevertheless, we can also apply our relative point of view too, and assert 
that every group G must contain (usually a lot of) cyclic subgroups: just pick 
any g € G and then we can form the cyclic subgroup (g) of G. 


Example 4.18. Consider the group (Z,+). Notice that “3?” actually means 
3+83 here, since the group operation is addition. Thus we use additive notation 
here, and write 2-3 instead of “3?” in this group. Conveniently, this notation 
agrees with the interpretation of - as ordinary multiplication. 

More generally, we write n- a instead of “x”” in the group (Z,+), for x 
and n in Z. Thus for any integer x, the group generated by x is 


(x) ={n-x : nEZ}, 


the set of all integer multiples of x. Since every integer is an integer multiple 
of 1, we have (1) = Z, so Z is a cyclic group under addition. We also have 
(—1) = Z. In general, a cyclic group will have more than one generator, unless 
the group has order 1 or 2. 


Exercises 43 


Example 4.19. Consider the group (Q*,-), where Q* = Q — {0} and - is 
ordinary multiplication. If Q* were cyclic, then we would have Q* = (¢) = 
{(¢)” | n © Z} for some non-zero integers a and b. But if p is any prime 
number which does not divide a or b, then the equality (¢)" = p is impossible 
for any integer n. Therefore, (Q*,-) is not cyclic. 

Next we come to an important general property of subgroups: the inter- 
section of any collection of subgroups of a given group is a subgroup of that 
group. The proof is entirely formal. 


Lemma 4.20. Let G be a group, and let € be a non-empty collection of 
subgroups of G. Then NHecH <G. 


Proof. Set IT =NyeeH. We shall use the Subgroup Test to verify that I < G. 

STO: [Show e € J; this means VH € €,e € H] Let H € €. By Lemma 
4.4, e € H. Since H was an arbitrary member of €, we have e € I. So I is 
non-empty. 

ST1: Let « € I. Then VH € €,x € H. [Show x7! € J; this means 
VH € €,2°!'€ H| Let H €€. Then x € H and A < G. Now Z is closed 
under inverses (by ST1 for H), so we have x~! € H. Since H was an arbitrary 
member of €, we have 2! € J. 

ST2: Let x,y € I. (Show «-y eI] Let HE €. Thenz,y € Hand A <G. 
Now 4H is closed under the group operation (by ST2 for H), sox-y € H. 
Since H was an arbitrary member of €, we have x-y € I. 


Armed with Lemma 4.20, we return to the question of finding the smallest 
subgroup of a group G which contains a given subset S' of G. Consider the 
collection 

€:={H|H<GandSC H} 
of all subgroups of G which contain S. We are guaranteed by Lemma 4.20 
that the intersection of all subgroups of G containing S, 


DE, 
Hee 
is a subgroup of G. A bit of thought shows that I is the smallest possible 
subgroup of G which contains S. Comparing this result with Theorem 4.12, 
we conclude that J = (5). We thus have two ways of viewing the smallest 
subgroup of G containing S: the “bottom-up” view of Theorem 4.12, where 
we build (S') up from elements of S and their inverses, and the “top-down” 
view we just saw, where we arrive at (5) by starting with G and paring away 
the excess, intersecting all the subgroups containing S. 


4.3. Exercises 


Exercise 4.1. Let G be a group. 


44 Subgroups 


(a) Prove that G< G. 
(b) Prove that {e} < G. 
Exercise 4.2 (The Two-Thirds Rule). Let H < G, and suppose that x,y,z € G 


and «-y = z. Prove that if any two of these elements are in H, then so is the 
third. 


Exercise 4.3. Let H, kK, and G be groups. 
(a) Prove that if H < K and K < G, then H <G. 
(b) Prove that if H<G,K<G,andHCK,thn H< K. 


Exercise 4.4. Decide, with proof, whether each of the following sets is a sub- 
group of R under ordinary addition: 


1.Q 
2. {a+bV/2 : abe Z} 
3. The set of all irrational numbers together with 0 


4. {x? : x € R} (Note: “a?” means x-x- 2, and notx+24+2 
here.) 


Exercise 4.5. Prove that any group of order 2 is cyclic. (See Exercise 3.2.) 


Exercise 4.6. Let G = S3. 

(a) Find all of the cyclic subgroups of G. How many are there? What are 
their orders? 

(b) Find two elements x,y € G which do not commute. 

(c) Show that the x and y that you found in part (b) generate G; that is, 
show that G = (a, y). (How did I know this would happen?) 


Exercise 4.7. Consider the group (Z, +). 

(a) Describe the subgroup (n) as simply as possible, where n is an arbitrary 
integer. 

(b) What is your first reaction to the question: Is (4,6) cyclic? 

(c) Prove that (4,6) = (2). Is (4,6) cyclic? 

(d) Make a conjecture about the nature of the subgroup (a,b) for general 
integers a and b. What is the simplest way to describe this subgroup? Can 
you prove your conjecture? 


Exercise 4.8. Let S be a set and let - be an associative binary operation on 
S with an identity element e. Thus we are saying that (S,-) satisfies Group 
Axioms GO, G1, and G2 but not necessarily G3. Let T = {a4 € S| dy € 
Ss.t.a-y=y-x =e}. (That is, T consists of all elements in S with inverses 
in S.) 

(a) Let A = -|rxr. Prove that (T, A) is a group. 

(b) Why is it technically incorrect to say that (T,-) is a group? (We will 
abuse notation this way sometimes in the future, however.) 


Exercise 4.9. Must the union of two subgroups of a group G be a subgroup 
of G? 


Exercises 45 


Exercise. 4.10. This exercise refers to Exercise 3.4. For an element A = 
c 1 | 2S, define |Al = ad — be. 


(a) Prove that for all A, B € S, we have |A- B| = |A|-|B|. (Note that the 
- on the left side is the operation on S defined in Exercise 3.4, while the - on 
the right is ordinary multiplication in R.) 

(b) For an arbitrary element A of S such that |A| 4 0, find an element C 
of S such that A.C = e, where e is the identity element found in Exercise 
3.4. 

(c) Compute |e]. 

(d) Prove that if A € S and |A| = 0, then there is no C € S' such that 
A:-C=e. 

(e) Set G={AeS : |A| 4 0}. Use the previous parts of this exercise 
together with Exercise 4.8 to prove that (G,-) is a group. 


Exercise 4.11. Let G and S be as in Exercise 4.10. 
(a) Let H={AeES : |A|=1}. Prove that H<G. 
(b) Find a subgroup of G of order 2. Hint: see Exercise 4.5. 


Exercise 4.12. Let G bea group and let g € G. Set H={x EG: g-a=«a-g}. 
Note: we often denote H by Ce(qg), and call it the centralizer of g in G. 

(a) Prove that H < G. 

(b) Prove that (g) < H. 


Exercise 4.13. Suppose that G is a group containing two elements x and y 
such that v-y=y-a. 

(a) Convince yourself that (a, y) must be abelian, and explain why. 

(b) Prove the result from part (a) formally. 

(c) Generalize the result from part (b) as follows: let 71,...,@%, € G and 
suppose that these elements commute pairwise (that is, aj; = x;x; for all 
i,j). Prove that (1,...,@p) is abelian. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


) 


Symmetry 


5.1 What is Symmetry? 


One application of group theory is to the study of symmetries of objects. But 
what do we mean by a “symmetry”? Consider a blank rectangular sheet of 
paper lying on my desk in front of you. If I wait until your back is turned and 
then sneakily turn the paper by exactly 180 degrees about its center, when you 
look again you will have no way to tell that I changed anything! The success 
of this parlor trick has something to do with the symmetry of the sheet. 

We proceed to give a mathematical treatment to the example above. Let us 
agree to put the origin (0,0) of a Cartesian coordinate system at the paper’s 
center. The action of rotating the sheet of paper by 180 degrees about its 
center can be captured by a function. Namely, consider the function f : R? > 
R? which takes a point P as input and rotates P by 180 degrees about the 
origin. It is not hard to discover a formula describing this function: namely, 
f:(2,y) > (—2,-y). 

We would like to say that this function f has no discernible effect on the 
sheet of paper. Clearly, f moves around individual points on the sheet, but, 
as a whole, f does not change the set of points of the sheet. 

The reader may have heard symmetry described as a property of an object, 
as in, “a human has bilateral symmetry” or “the function y = x? is symmetric 
about the y-axis.” Our goal is to specify a “symmetry” of an object as a 
freestanding thing unto itself, and moreover a standard type of mathematical 
entity: we will say that a symmetry of an object 7s a function on the underlying 
space which leaves our object, as a whole, unchanged. 


Definition 5.1. Let X C S be two sets. Then a symmetry of X with respect 
to S is a bijective function f : S > S such that f(X) = X. 


Remark 5.2. Recall how we define the image of a set under a function: namely, 
f(X) := {f(a) | a © X}, so the image of X under f is the set of images of 
all points of X. Thus, the definition of symmetry given above does not assert 
that f takes every point of X to itself, but rather that f takes X as a whole 
to itself. 


Example 5.3. Let S = R?, let a,b > 0, and let X = [—a, a] x [—b,b]. Define a 
function f : S > S by f : (a, y) + (—a,—y). Then f is bijective, and we have 


DOI: 10.1201 /9781003252139-5 A7 


48 Symmetry 


Pe X iff f(P) € X. So f(X) = X, and f is a symmetry of X with respect 
to R?. Notice that X is just the rectangle of width 2a and height 2b centered 
at the origin, and f is the symmetry of X described above in our discussion 
about the sheet of paper. 

Example 5.4. Let S be any non-empty set, and let f be any bijective function 
from S to S. Then we have f(S) = S, since f is surjective. Therefore f is 
a symmetry of S with respect to itself. We usually say simply that f is a 
symmetry of S in this case. But recall that we defined a permutation of S to 
be a bijective function from S$ to S. We conclude that Sym($), the symmetric 
group on S, is exactly the set of all symmetries of S! 

In general, the set of all symmetries of X with respect to S is a subgroup 
of Sym(S) (Exercise 5.1). Often, we are interested only in those symmetries 
of an object which satisfy some additional conditions. For instance, we may 
desire that our sheet of paper should not have to be stretched, shrunk, or torn 
into pieces and re-assembled in order to apply a symmetry; something like 
rotation, on the other hand, would be reasonable. One way to capture this kind 
of requirement is to stipulate that a symmetry should preserve distances and 
angles; a symmetry with this property is sometimes called a rigid symmetry. 
A looser requirement of this nature would be simply to require a symmetry 
to be continuous. 


5.2 Dihedral Groups 


Fix an integer n > 3, and let P,, be the regular n-sided polygon (or “n-gon” 
for short) in R? centered at the origin and with one vertex at (1,0). What are 
the rigid symmetries of P,, with respect to R?? 

One rigid symmetry of P, is the rotation R about the origin by an angle 
of 27/n radians in the counterclockwise direction. The function R moves each 
vertex of P, to the vertex next to it in counterclockwise order. Let us derive 
a formula for R with respect to Cartesian coordinates. 

Let P = (x,y) be a point in R?. To describe the action of R on P, it is 
convenient to make use of polar coordinates. Supposing that the polar coor- 
dinates of P are (r,6), we have x = rcos(@), y = rsin(9). For convenience, set 
a = 2r/n. 

Then the polar coordinates of R(P) are (r, 6+ a). We are interested 
in the Cartesian coordinates (x’,y’) of R(P). We have x’ = rcos(@+ a) = 
rcos(@) cos(a@) — rsin(@) sin(a) = xcos(a) — ysin(a), and y’ = rsin(@ + a) 
= rsin(0) cos(a) + rcos(6) sin(a) = ycos(a) + xsin(a). Therefore we have 


R(x, y) = (xcos(a) — ysin(a), x sin(@) + ycos(a)). (5.1) 


In addition to its rotational symmetry, the polygon P,, is also symmetric 
about the x-axis. The corresponding symmetry of P,, with respect to R? is 


Dihedral Groups 49 


the “flip” F about the z-axis, given by the formula 
F(x,y) = (&, -y). (5.2) 


At this point, the reader should think about what other rigid symmetries 
f P, may exist. 
In fact, we assert (without proof; the reader is invited to confirm it) that 
ll of the rigid symmetries of P,, can be obtained by composing some number 
f rotations and flips in some order. What is the set of all symmetries that can 
e obtained in this manner? Recall that in a symmetric group, the group op- 
ration is composition of functions. Thus, our answer is exactly the subgroup 
f Sym(R7) generated by F and R. 


° 


TO ff 


@) 


° 


Definition 5.5. Let n > 3 be an integer. The dihedral group D2 is the 
subgroup of Sym(R?) generated by F and R, where F is the flip about the 
x-axis and R is counterclockwise rotation about the origin by 27/n radians. 
That is, 

Don = (F,R). 


We shall see shortly the reason for using 2n instead of n in our notation. 

Our immediate goal is to understand the group D2,. A typical element 
of Dz, is a product of terms of the form F, R, F~!, and R~!. Thus, some 
examples of elements of Dz, are F, R, FoR, FoR7!, RoRoFoRoFoR™!, 
etc. 

As astart, let us analyze F~! and R~'. These are just the inverse functions 
of F and R, respectively. So R~! is the clockwise rotation about the origin 
by 27/n radians; this is the same as the counterclockwise rotation by —21/n 
radians. An easy way to obtain a formula for R~! is to substitute —a@ for a in 
the formula for R. We find R~'(a,y) = (xcos(—a) — ysin(—a), x sin(—a) + 
ycos(—q)). Taking advantage of the fact that sine is odd and cosine is even, 
we get 

R7'(2,y) = (x cos(a) + ysin(a), y cos(a) — asin(a)). (5.3) 


Notice that applying F twice brings every point of R? back to its original 
position. In other words, Fo F is the identity function on R?; but this is also 
the identity element of Sym(S). Therefore we have F~' = F, and 


Fe. (5.4) 


Next, we consider the cyclic subgroup generated by each element F and R 
separately. Since F? = e, then F? = F20 F =eoF =F, Ft = F8oF= 
FoF = F? =e, etc. We also have F° = e (by Definition 3.38). Further, for 
every positive integer k, we have F—* = (F~1)* (by the Laws of Exponents) 
= F* (Since F~! = F). We conclude that for any integer k, we have 


peal® if k is even; 
if k is odd. 


50 Symmetry 


In particular, (Ff) = {e, F}. 

We now consider (R) = {R* | k € Z}. First let k be a positive integer. 
Then R*, the composition of k factors of R, is just a counterclockwise rotation 
by 2k7/n radians. Also, R~* is the inverse of R*, which is a counterclockwise 
rotation by —2k7/n radians. 

We conclude that for all integers k, R* is a counterclockwise rotation by 
2km/n radians. These powers of R start repeating when k = n. Indeed, R” is 
a rotation by exactly 27 radians or 360 degrees, so we have 


R” =e. (5.5) 


(Exercise 5.4 asks you to formally verify this assertion.) 

It follows that R°+! = R°oR=eoR=R, R"+? = R"0 R? = €oR? = R?, 
etc.; so we only need the first n non-negative powers of R to capture all positive 
powers of R. For negative powers, we observe that Ro R"-! = R” = e, 
and so we must have R~' = R”~!. From this, we see that all the negative 
powers of R can be written as positive powers of R. We conclude that (R) = 
(eR RR cg 

We interrupt this discussion of D2, to record and prove a general version 
of what we have just learned about (R). 


Lemma 5.6. Let G be a group, let x € G, and PupLOs that x* = e for some 
positive integer k. Then we have (x) = {e, x, x7, x? ma 


Proof. Set H = {e,x,27,2°,...,a°—"}. Since (x) = de | 7 € Z}, we certainly 
have H C (x). On the other hand, if y € (x), then we can write y = x for 
some integer 7; and in turn, by the Remainder Theorem for integer division, 
we can write 7 = qk +r for integers g and r such that 0 < r < k —1, to get 
y= oi = hth — gtkgr — (gk\igr — eg” = ex” = x", so y € H. (This last 
series of equalities used the Laws of Exponents and also Exercise 3.3). 


Warning 5.7. Lemma 5.6 shows that the order of (x) is at most k. But 
the elements e, 2, 27, x°,...,2*~' need not be distinct. Thus we may have 
|(x)| < k. However, if we take k to be the smallest positive integer with 


the property of the Lemma, then we do have |(x)| = k; see Exercise 5.7. 


The next step in our quest to understand D2, is to find a relationship 
between F' and R. Suppose that we take a regular pentagon (n = 5) and label 
its vertices in counterclockwise order 0, 1, 2, 3, 4, starting with the vertex at 
(1,0). After applying the rotation R, vertex 4 will be at (1,0), and the new 
vertex order is 4, 0, 1, 2, 3. If we next apply the flip F’, the final vertex order 
will be 4, 3, 2, 1, 0. 

On the other hand, suppose we start with the original order 0, 1, 2, 3, 4, 
but first apply F’. We get the new order 0, 4, 3, 2, 1. We can achieve the order 
4, 3, 2, 1, 0 from here by rotating once in the clockwise direction: that is, by 
applying Ro!. 

This discussion seems to indicate that 


FR=R"'F. (5.6) 


Dihedral Groups dl 


We proceed to prove this equation rigorously in the general case. 
Claim 5.8. For any n > 3, we have FR = R7'F in the dihedral group Dan. 


Proof. [FR and R~'F are functions | [Two functions are equal iff they have 
the same domain and they give the same output for each input: so we must 
prove VP € R?,(FR)(P) = (R71F)(P) | Let P € R’, and write P = (z,y). 
Then 


(FR)\(P) = F(R(P)) 
= F(acos(a) — ysin(a),xsin(a) + ycos(a)) 
= (xcos(a) — ysin(a), —xsin(a) — ycos(a)), 
and 
(RUF\(P) = R“(F(P)) 
ae taal —y) 


(a cos(a) — ysin(a), —y cos(a) — a sin(a)). 


We will say that Equations 5.4, 5.5, and 5.6 are “relations” involving F 
and R. More generally, we make the following definition, which simply says 
that we will use the term “relation” to mean any equation involving group 
elements. This use of the word “relation” should not be confused with that in 
Definition 8.3. 


Definition 5.9. Let G be a group. A relation in G is an equation of the form 


T1TQ°°* Lm = YiY2°° Yn 
where £1,-.-,2m;Y1;---,Yn € G. 


The next result and its corollary tell us a lot about the structure of D2. 
We state these results at a level of generality that goes somewhat beyond the 
dihedral groups, in consideration of a future application. (We keep the notation 
F and R, so that these symbols could, but do not necessarily, represent the 
flip and rotation of D2, in the result below.) 


Proposition 5.10. Suppose that G is a group generated by two elements F 
and R which satisfy the relations F? = e, R" =e, and FR= R-'F (where n 
is a positive integer). Then 


G={e R, RR, R3,..., | R™, 
F, FR, FR?, FR?, ..., FR"}. 


Proof. Let S = {e, R, R?,...,R" 1, F, FR, FR?,..., FR"~1}. Then certainly 
S CG. Our strategy will be to show that S$ < G. Once we know this, we can 
argue as follows: S is a subgroup of G which contains F' and R; but G = (F, R), 


52 Symmetry 


so G is the smallest subgroup of G which contains F and R; therefore, G C S, 
and soS=G. 

It remains to show that S < G. We use the Subgroup Test. 

STO: e € S, so S is non-empty. 

ST1: Let x € S. [Show x! € S| Then we can write x = F*R® for some 
a € {0,1} and b € {0,1,...,n—-1}. Soz7! = R-°F~? (by Exercise 3.15). We 
consider two cases: 

Case 1: a = 0. 

Then z~' = R-° € (R). But (R) = {e, R, R?,..., R"-+} (by Lemma 5.6). 
So (R) C S, and thus x € S. 

Case 2: a4 = 1. 

Then x-1 = R-°F-1 = R-°F (since F = F~1) = FR? (by Exercise 3.5). 
Soa t=areS. 

ST2: Let 2,y € 9. [Show xy € S ] Then we can write « = F*R? and y = 
F°R? for some integers a, b, c, d with a,c € {0,1} and b,d € {0,1,2,...,n—1}. 
So xy = F*R°F°R?. We break into two cases according to the value of c: 

Case 1: c= 0. 

Then zy = F°R?R? = F¢R>+4, Now R°+4 &€ (R), so, by Lemma 5.6, 
we can write R°+4 = R/ for some integer j with 0 < 7 < n—1. Thus 
sy=F°*R ¢S. 

Case 2:c = 1. 

Then ey = F°R°FR¢ = F°FR-RI = Fe+1RI->, Using Lemma 5.6 
twice, we can write F¢+1 = F* for some k € {0,1} and R?~° = Ri for some 
7 € {0,1,2,...,n—1}. Therefore xy = F* RI € S. 


Corollary 5.11. Suppose G, F, and R are as in Proposition 5.10. Then the 
group operation of G is given by 


(F°R)(F°R*) _ Pete pet (-1)%b (5.7) 
for a,c € {0,1} and b,d € {0,1,2,...,n—1}. 


Proof. This follows by inspection from the two cases c = 0 and c = 1 of the 
proof of ST2 above. 


The next result completes our description of D2, for the moment. 
Theorem 5.12. For any integer n > 3, the order of Dan is 2n. 


Proof. By Proposition 5.10, the order of Dz, is at most 2n. [Strategy: Prove 
that the elements e, R, R?, R°,...,R"~!, F, FR, FR?, FR°,... FR"! are 
distinct in D2,. To prove that a list has distinct entries, we prove that if two 
list entries are equal then their indexes are equal | Suppose that F¢R? = F°R?4 
with a,c € {0,1} and b,d € {0,1,2,...,n —1}, and assume without loss of 
generality that b < d. [ Show: a = c and b= d] Then we have F¢-* = R™?. 
We break into cases based on the value of F'°~°. The two cases F°~° = e and 
F°¢~° = F are exhaustive, by Lemma 5.6. 


Exercises 53 


Case 1: F°~-° =e. 

Then R2-° = e. Since 0 < b< d< n—1, we have 0 < d—b <n. It follows 
that b = d. (For a formal algebraic proof of this last implication, see Exercise 
5.4.) 

Also, we have F* = F°, and a,c € {0,1}. We suppose for a contradiction 
that a # c. Then we have F° = F!, so F = e. But F is not the identity 
function on R?. Therefore, a = c. 

Case 2: F°~° = F. 

Then R¢-? = F. Let k = d—b. Then 0 < k < n-—1. Evalu- 
ating both functions in the equation RP = F at the point (1,0) gives 
(cos(2k7/n), sin(2k7/n)) = (1,0) (see Exercise 5.4). But 0 < 2ka/n < 2r, 
so we must have k = 0, and b = d. But now we have F = R?-? = R°=e,a 
contradiction. Therefore, Case 2 is impossible. 


Remark 5.13. At last we see the reason for the 2n in the notation: Do, is the 
dihedral group of order 2n. 


We conclude this section with a meditation on the dihedral groups. Orig- 
inally, we conceived of Dz, as a (mysterious) collection of symmetries of a 
regular n-gon, whose elements are functions from R? to R?. Now we have 
come to realize that D2, can be described as a collection of certain combina- 
tions of the functions R and F’,, along with a certain group operation. We shall 
see that there is a sense in which D2, can be described as the group generated 
by the symbols R and F subject to the three relations F? = e, R” = e, and 
FR= R7'F. The next sections will allow us to make this idea precise. 


5.3 Exercises 


Exercise 5.1. Let S be a non-empty set and let X C S. Let Sym, (S) denote 
the set of all symmetries of X with respect to S. Prove that Symy(S) is a 
subgroup of Sym(S). 
Exercise 5.2. Suppose that T is a subset of the x, y-plane which is symmetric 
about the line y = x (in the language of high-school algebra). 

(a) Find a function f : R? > R? such that f is a symmetry of T with 
respect to R? and f is not the identity function on R?. 

(b) What is (f)? 
Exercise 5.3. Using Equations 5.1 and 5.3, verify algebraically that Ro R7! 
and R~! o Rare both equal to the identity function on R?. 


Exercise 5.4. (a) Starting with equation 5.1, use induction to prove the for- 
mula R*(x,y) = (xcos(ka) — ysin(ka), xsin(ka) + ycos(ka)) for all positive 
integers k. 


54 Symmetry 


(b) Use the result of part (a) to show that R" =e, but R’ A ecifl <k <n. 


Exercise 5.5. Let x € Do, for some integer n > 3. 

(a) Prove that if 2 = FR* for some k € {0,1,2,...,n —1}, then (x) has 
order 2. 

(b) Prove that if 2 = R*® for some k € {0,1,2,...,n — 1}, then (x) has 
order at most n. 


Exercise 5.6. Use the following steps to prove that every subgroup of a cyclic 
group is also cyclic. Suppose that G = (a) is a cyclic group generated by an 
element a € G. Let H < G with H #4 {e}. Set F={jeZ: j>landwe 

(a) Prove that F is non-empty. 

(b) Why must E have a smallest element? 

(c) Let m be the smallest element of E. Prove that H = (a™). Hint: Use 
the Remainder Theorem for integer division. 


Exercise 5.7. Let G be a group, let a € G, and suppose that a* = e for some 
positive integer k. Let m = min{n € Z : n> 1 and a” =e} be the smallest 
positive integer with this property. 

(a) Prove that |(a)| =m. 

(b) Prove that for any integer j, we have a’ = e iff m | j. 

(c) Deduce that for two integers i and j, we have a’ = a? iffi = 7 (mod m). 
Recall that we write i = 7 (mod m) to mean m divides i — j, and we read the 
former expression as “2 is congruent to 7 modulo m.” 


Exercise 5.8. Let G be a group and let a € G. Prove that if G is finite, then 
there exists a positive integer k such that a* = e. 


Exercise 5.9. Let S be a non-empty set. Let X C S and let H < Sym(S). Let 
Y ={f(a)| f € H andae X}. Let G be the group of symmetries of Y with 
respect to S. 

(a) Prove that G contains H. 

(b) Show by an example that G may be strictly bigger than H. 

(c) Consider the special case S = R?, and let X be the line segment from 
(0,1) to (1,0). Make a sketch of the corresponding set Y when H = Dg. Note: 
We view X as an arbitrary “seed” (or starting point) object, and H as a group 
of symmetries that we want to force upon X, resulting in the symmetric object 
Y. This process reminds us of the operation of a kaleidoscope, where X is a 
real object with no particular symmetries (say, a collection of bits of colored 
plastic), and Y is the image we actually see in the kaleidoscope. The principle 
that by applying every possible symmetry in some group H to an arbitrary 
object we get a highly symmetric object (invariant under H) we refer to as 
the Kaleidoscope Principle. 


6 


Free Groups 


6.1 The Free Group Generated by a Set 


Suppose we have a non-empty set S. For the sake of definiteness, let us take 
a 2-element set, S = {x,y}. What does (S') look like? Well, (S') contains 2, 
y, x ', y+, as well as zy~+, x~ty~1, and so on: that is, (S) is the set of all 
“words” formed out of the symbols x, y, 2~+, and y7t. 

But wait a moment—we never said that S was a subset of any group! 

Well, we can run with this description of (S'), and actually create a group 
out of these “words.” Of course, in the resulting group, we expect inverses to 
cancel each other; but otherwise, we will not impose any relations among the 
elements. 


Definition 6.1. Let S be a non-empty set. A wordon S$ is an ordered sequence 
w of the form w = s{'s5?---s,*, where k is a non-negative integer (called the 
length of w), si; € 9, and e; € {1,—1} for each i. We also write s; for s}. The 
word w is called reduced if it does not contain two adjacent terms of the form 


tt-! or tt. 


Remark 6.2. A word is to be thought of as merely a succession of symbols. 
Two reduced words are equal if and only if they appear equal, i.e., if they 
consist of exactly the same elements of S to the same powers, written in the 
same order. 


Remark 6.3. Notice that we allowed a word to consist of zero terms (k = 0). 
We call such a word the empty word, and denote it by e. 


Example 6.4. The word xyx~ty~yyy is not reduced, but we can “reduce” it 


to get the word xyx~tyy. 


In general, reducing a word consists of eliminating all adjacent pairs which 
prevent the word from being reduced: that is, eliminating everything of the 
form tt! and t~!t, repeatedly, until no more such pairs remain. 

To “multiply” two words, we simply write them one after the other, and 
then reduce. For example, we have (xyx~'y~') - (yyy) = xyx~ty~tyyy = 
xyx yy. The formal name for placing two words side by side to make a new 
word is concatenation (from the Latin word for “chain”). 


DOT: 10.1201/9781003252139-6 55 


56 Free Groups 


Definition 6.5. Let S be a non-empty set. The free group on S is the pair 
(Fr(S),-), where Fr(S) is the set of all reduced words on S, and - is the 
operation of concatenation followed by reduction. 


Remark 6.6. As we would expect, we have xyx~!y~!- yyy = xyx~tyy in the 


free group on {x,y}. However, if we are careful to always write the - when 
multiplying, then we technically cannot write xyx~ty~!yyy = xyx~'yy in 
this group because xyx~ty~'yyy is not even an element of the group: it is not 


reduced! 


Note that (Fr(S),-) really is a group; we give an informal demonstration 
that the group axioms are satisfied: 

G1: Concatenation is associative, and reductions can be performed in any 
order without affecting the final result. To compute either (w+ we) - w3 or 
w1 + (w2- ws), the concatenation w;w2w3 may as well be done first, followed 
by all the reductions. Thus the two expressions are equal. 

G2: The empty word, e, is an identity element of (Fr(S),-). For concate- 
nating a word with the empty word does not change the original word. 

G3: The inverse of a reduced word is another reduced word: we have 


Le Ek €2 e1y 
811897 +++ Sy) = 88 + 897815 


so if s;* and Oey are never inverses of each other, then Secs and s; “' are 
never inverses of each other. 

Note that the free group on S is generated by the elements of S. So we really 
did find (S) in some sense, just as we originally set out to do, even though S$ 
was not originally assumed to be inside of any group! Namely, (S) = Fr(S). 
Remark 6.7. A free group such as Fr({z,y}) is “free” in the sense that its 
elements have no non-trivial relations among themselves: that is, no relations 
that are not forced by cancellation of symbols with their inverses. A free group 
is “relation-free.” 


By contrast, the dihedral group Do, = (F, R) does have non-trivial rela- 
tions, such as FR = R7'F and FF =e. 


6.2 Exercises 


Exercise 6.1. Let w be a word on the set {x,y}. Prove that the reduced form 
of w is independent of the order in which reductions are performed. 


Exercise 6.2. Give a detailed proof that Fr({z, y}) is a group with the opera- 
tion of concatenation followed by reduction. 
Exercise 6.3. Let S be a finite set of size n. Let k EN. 


(a) How many words on S' have length k? 
(b) How many reduced words on S have length k? 


7 


Group Homomorphisms 


7.1 Relationships between Groups 


Even though the two groups Fr({z,y}) = (#,y) and Do, = (F, R) are very 
different from each other, we sense that a relationship exists between them, 
since they can both be generated by two elements. To be specific, consider 
the correspondence e G ce, x O F,y & R, cy © FR, cyxz | & FRE}, 
xy? © FR?®, etc., where we associate a word on {x,y} with the corresponding 
“word” on {F, R}. 

This correspondence is not one-to-one, since different elements of Fr({x, y}) 
can correspond to the same element of D2,: for instance, we have e + e and 
x? 4 F? =e, but e £ 2? in Fr({z,y}). The correspondence does, however, 
give us a function 


w : Fr({x,y}) > Dan (7.1) 


sending a word on {2,y} to the element of Dz, obtained by replacing « with 
F and y with R. This function expresses a natural relationship between these 
two groups. For ease of notation, set H = Fr({z, y}). 

Let us explore what makes this function so natural. We could say that w 
is natural because it is simply a substitution rule that replaces « by F’ and 
y by R. While this is true, we seek a more “functional” explanation, which 
involves the relationship of w to the group laws of the two groups in question. 

To apply a group law, we need two group elements. So let us choose two 
elements a and b in H. For example, set a = xyx~! and b = yyx. Then we 
have w(a) = FRF~! and w(b) = RRF. Playing around with products, we see 
that w(a-b) = w(xyx !yyx) = FRF-!RRF, which is the same as w(a)-w(b). 
This phenomenon occurs for arbitrary elements a and b in H, as you should 
convince yourself; it forms the basis of the following definition. 


Definition 7.1. Let (G1,-1) and (G2,-2) be groups. A group homomorphism 
from G, to Gg is a function 0 : Gy, > Ge such that 


Ya,b € Gy, a(a-1 b) = (a) -2 o(d). 


Remark 7.2. We arrived at the notion of group homomorphism by noting a 
relationship which appeared to exist between two particular groups. In general, 
the fundamental way (some might say the only way) that two groups can be 


DOI: 10.1201/9781003252139-7 57 


58 Group Homomorphisms 


related is via a group homomorphism from one to the other. In the same vein, 
we seldom if ever care about any function from a group to another group 
unless that function is a group homomorphism. 


Example 7.3. The function w in Equation 7.1 is a group homomorphism from 
Fr({z, y}) to Dan. 


We shall see several more examples in the following section. 

Next we establish the result which says that a group homomorphism re- 
spects identity elements, inverses, and products of finitely many factors to 
arbitrary integer powers. 


Proposition 7.4. Leta : G, > G2 be a group homomorphism, and let e; 
be the identity element of G;. Then we have: 

(i) o(e1) = e2. 

(ii) For alla € Gy, o(a7) = (o(a))71. 

(iii) o(apray?»--az") = (a(a1))"(o(a2)) +++ (o(an))"* for all a1, a2, 


.., aE in Gy and all ny, no, ..., Ne in Z. 


Proof. (i) We have o(e,-e1) = o(e1)-a(e1) since o is a group homomorphism. 
Also, e1 - e€; = e1 since e; is an identity element. Therefore, 


a(e1) = a(e1) - o(e1). (7.2) 
Since o(e1) € G2 and eg is the identity element of G2, we also have 
a(e1) = €2-a(e1). (7.3) 


Using the Right Cancellation Law, we find o(e1) = eg, as desired. 

(ii) Let a € Gy. Then we have o(a-a~+) = o(a)-o(a~*) by the fact that o 
is a group homomorphism. Also, a- a~! = e;, so o(a-a~!) = a(e1), = eg by 
part (i) above. Therefore, 7(a)-o(a~') = e2, so o(a~*) is the inverse of o(a). 

(iii) First, the case k = 1 follows when n; = 0 from part (i) above; for 
positive powers 71, it follows from the defining property of a group homomor- 
phism by induction on n 1; for negative powers n,, the proof is similar, but 
also uses the result of part (ii) above. Then the general case k > 1 follows 
by induction on k, again using the definition of group homomorphism. The 
reader is asked to give a detailed proof in Exercise 7.3. 


The defining property of a group homomorphism 0 : G; — Gp» can also 
be expressed as follows: 

To compute o of the product a-b, we can either compute o(a) and o(b), 
then take the product of the results; or compute the product a- b, and then 
take o of this product. The result is the same in both cases. 

It seems that we are saying that the order in which we apply the group law 
and apply o doesn’t matter. In other words, a “commutes” with the group 
law. But in fact, there are actually two different group laws here, so we need 
to clarify what we mean. It will help to first rewrite the defining property of 
a group homomorphism in function notation instead of operator notation. 


Relationships between Groups 59 


Recall that the group law -; is technically a function 
1: Gp xG,>G). 


Further, recall that in our operator notation for group laws, a -1 6 is just 
another way to write -;(a,b); and similarly for the group law of Gz. Thus we 
can rewrite the property that o is a group homomorphism as 


o(-1(a,6)) = -2(o(@), o(8)). (7.4) 


A diagram of functions will help us explain a precise sense in which o “com- 
mutes” with the two group laws. 


Gi x Gy, SES Oy Gs 


[> |» (7.5) 


G— @ 
In Diagram 7.5, four sets and four functions are represented. The bottom 
row is simply the homomorphism o. The two sides are exactly the group 
operations of G; and Go. Finally, the function 


axa: Gx Gi > Go x Ge 
is defined by the natural-seeming formula 


(o x a)(a,b) = (o(a), o(b)). 


Starting with an element (a,b) in the upper-left corner of the diagram, there 
are two possible ways to reach the lower-right corner: we can apply 0 x 0 
followed by -2, or we can apply -; followed by o. Equation 7.4 says that we 
get the same result no matter which of these two paths we take. Another way 
to say this is: 

(2) 0 (a0 x a) =a 0 (+4). (7.6) 
We may describe the property of “commutativity” by saying that the result 
of an operation is independent of the order in which we choose the inputs. 
Equation 7.6 says that the result in Gz of any input from G, x G, is inde- 
pendent of the order in which we choose to move in Diagram 7.5: it doesn’t 
matter whether we first move right and then down, or first down and then 
right. Thus, we would like to say that there is a sense in which Diagram 7.5 
is “commutative.” We make this notion precise in the context of a general 
function diagram by means of the following definition. 


Definition 7.5. A function diagram is commutative if, for any two paths from 
a set A in the diagram to another set B in the diagram, the corresponding 
composite functions are equal. We also express this by saying that the diagram 
commutes. 


Thus, our final restatement of the group homomorphism property is: A 
function g : G1 — G2 is a group homomorphism iff Diagram 7.5 commutes. 


60 Group Homomorphisms 


7.2 Kernels: How Much Did We Lose? 


We now turn to the subject of relations within a group. First, notice that 
every relation can be rewritten in an equivalent form such that the right-hand 
side of the relation is e: 


U1@Q°++ Lm = YLY2°° Yn => £122°**LmYni ++ Yo YL =e. (7.7) 


Definition 7.6. We shall refer to the right-hand side of the biconditional in 
Statement 7.7 as the standard form of the relation on the left-hand side. 


Let us return to the group homomorphism w of Equation 7.1. We will 
interest ourselves in the relations of D2,,. As we have seen, three of the relations 
in Do, are F? = e, R” = e, and FR = R“'F; the last relation can be rewritten 
as FRF-'R = e. Now, we have defined a “relation” to be an equation; and 
equations are useful tools in algebra, but we prefer not to make a study of 
equations as a class of objects (we leave this to the logicians). Luckily, every 
relation has an equivalent standard form in which the right-hand side is just 
e, and so it might appear that every relation can be given by a single group 
element, namely, the left-hand side of this standard form. For instance, the 
group elements corresponding to the three relations in Dg, stated above are 
F?, R", and FRF“!R. 

But wait a moment: each of these three group elements is just e; that was 
the whole point! So it is not accurate to say that a relation is determined by 
the left-hand side of its equivalent standard form thought of as a single group 
element. Rather, we want to think of a relation as a formal list of symbols. 
This seems familiar: it reminds us of words in a free group. 

Indeed, consider the elements x”, y”, and xyx~'y in the free group on 
{x,y}. These three elements are certainly not equal to e (nor to each other). 
The fact that their images under w are all e is the telling point. We realize 
that the elements of Fr({x,y}) which map to e under w exactly correspond 
to relations in Do,. To further our study of relations in a group, we therefore 
make the following definition. 


Definition 7.7. Let G; and G2 be groups, and let e; denote the identity 
element of G;. The kernel of a group homomorphism ¢ : G — Gz is the set 
ker(c) := 07! ({e2}) ={a€ Gi : o(a) =e}. 


Remark 7.8. In the definition of kernel, the group G; need not be free; the 
definition applies to any group homomorphism between any two groups. 


We will now describe the connection between kernels and relations in the 
general case; the reader should keep in mind the example of the map from 
Fr({z, y}) to Do, given above. We see from Proposition 7.4 that if some ele- 
ments a1, ..., @ of G; satisfy a relation in Gi, then their images o(a1), ..., 
o(ax) satisfy the same relation, but in Gz. For example, if a,a3 'a} = e1, then 


Kernels: How Much Did We Lose? 61 


we must also have o(a1)(a2)~!o(a3)* = eg. But the converse does not hold: 
a(a1), ..., 7(a,%) may satisfy additional relations not satisfied by a1, ..., ax. 
The extent to which we add new relations in going from G1 to Gz is measured 
by the kernel ker(o). A more informal way to express this is to say that the 
kernel measures how much G “collapses” in being transported to Go. 


Example 7.9. Let G; = R* := R — {0} under ordinary multiplication. Let 
G2 = {1,-1}, also under multiplication. Consider the “sign” function o 
G, > Gp defined by 

1, if x > 0; 


i a if x <0. 


To check that o is a group homomorphism, we must check that o(a) -o(y) = 
o(xy) for all non-zero real numbers x and y. But this statement merely 
expresses the familiar rules about signs of products: positive times posi- 
tive is positive, positive times negative is negative, and so on. We have 
ker(o) = {x € R* : o(x) = 1} =R, the set of all positive real numbers. 
The map o collapses all positive numbers onto 1 and all negative numbers onto 
—1, thus simplifying the structure of the group R* down to the sign rules re- 
ferred to above. In fact, it may be instructive to replace the set {1,—1} with 
the set {+,—} for Gp in this example. 


Example 7.10. Consider the absolute value function : R* > R*, 4 |z]. 
The well-known identity 


[zy = |a] - [y (7.8) 


says precisely that o is a group homomorphism. We have ker(a) = {a € R* 
|x| = 1} = {1,-1}. This corresponds to the fact that in taking the absolute 
value of a number, we lose the information about its sign, but nothing else is 
lost. 


Example 7.11. Let Rt denote the set of all positive real numbers. Consider 
the natural logarithm function mn : Rt — R, where we take the group 
operation on R+ to be multiplication and that on R to be addition. The 
identity 

In(a - y) = In(x) + In(y) (7.9) 


is exactly the condition we need for In to be a group homomorphism; notice 
that the operation on the left-hand side of Equation 7.9, inside the logarithm, 
is multiplication in R*, while that on the right is addition in R. We have 
ker(In) = {2 € R* : In(x) = 0} = {1}. This is the smallest of any of the 
kernels we have seen so far. We know in general (by Proposition 7.4) that every 
group homomorphism takes the identity element of its domain to the identity 
element of its codomain, so the kernel in the present example is as small as it 
could be. This tells us that the natural logarithm function loses no information 
when transporting the positive real numbers under multiplication to the real 
numbers under addition. Indeed, the natural logarithm function is invertible. 
The fact that the natural logarithm is a group homomorphism with kernel {1} 


62 Group Homomorphisms 


says that In faithfully translates multiplication into addition. Because addition 
is easier to perform than multiplication, logarithms were used in former times 
to ease the computational burden of human calculators (see Exercise 7.4). This 
is the reason for the explosive popularity of the logarithm upon its original 
discovery, although the idea of a “group” was then unknown. 


Example 7.12. Let G be any group, and let S = {z, : g © Gh; that is, 
S is a set containing one symbol for each element of G. Let F’ = Fr(S) be 
the free group on S. (The reader should think about why we choose not to 
simply consider Fr(G).) We can define a function tT : S > G by the formula 
T(Z,) = g. There is a unique way to extend 7 to a group homomorphism 
o : F -+G, namely, by taking a word in the z,’s to the corresponding term 
in the g’s. Clearly, o is surjective; for the image of 7 is already all of G. We 
say that every group is the homomorphic image of a free group. 


One theme in group theory is that many “naturally defined” subsets of a 
group turn out to be subgroups. The next result confirms this principle in two 
instances of sets which are naturally associated with a group homomorphism: 
the image and the kernel. 


Theorem 7.13 (Images and kernels are subgroups). Leto : G, > G2 be a 
group homomorphism. Then 

(i) 0(Gi) < Go, and 

(ti) ker(a) < Gy. 


Proof. We use the Subgroup Test for both parts. 

(i) STO: o(G1) is non-empty since o(e1) € o(G1). 

ST1: Let y € o(G,). [Show y~! € o(G,) ] Then Jr € G, such that 
y = o(2) (by definition of (G,)). So o(x~') = o(x)~+ (by Proposition 7.4 
part (ii)) = y~!. But x7! € Gi, so y~! € (Gh). 

ST2: Let y,z € o(G,). [Show yz € o(Gi) |] Then dw,x € Gi such that 
y = 0(w) and z = o(2) (by definition of a(G)). So yz = o(w)o(x) = a(wz) 
(by definition of group homomorphism). But wa € Gy, so yz € a(G}). 

(ii) STO: We have o(e,) = e€2 by Proposition 7.4 part (i). Therefore, 
e; € ker(c) (by definition of kernel). So ker(c) is non-empty. 

ST1: Let x € ker(c). [Show x7! € ker(c) | Then o(2) = e2 (by definition 
of kernel). So a(a~!) = o(x)~! (by Proposition 7.4 part (ii)) = ey! = eg. 
Thus we have x~! € ker(c). 

ST2: Let w,x € ker(c). [Show wa € ker(c) ] Then o(w) = a(x) = e2 (by 
definition of kernel). So o(wx) = o(w)o(x) (by definition of group homomor- 
phism) = eg - eg = eg. Therefore, wa € ker(o). 


ai aN 


As a first application of Theorem 7.13, we refine Example 7.12. 


Example 7.14. Let G be any group, and suppose that T is a subset of G 
which generates G: that is, (T’) = G. Instead of taking one symbol for each 
element of G as we did in Example 7.12, it is enough to take one symbol for 
each element of T. So let S={x, : g € T}. There is again a unique group 


Kernels: How Much Did We Lose? 63 


homomorphism 0 : Fr(S) — G such that o(2,) = g for each g in T. Since 
the image of o contains T, and is a subgroup of G by Theorem 7.18, then it 
contains (T’), which is all of G; so o is surjective. 


Theorem 7.13 says that the image and the kernel of a group homomorphism 
are subgroups of the codomain and the domain, respectively. This result raises 
the question of whether a sort of converse might be true: given a group G and 
a subgroup H < G, we ask (1) Is H the image of some group homomorphism 
with codomain G?, and (2) Is H the kernel of some group homomorphism 
with domain G? 

To answer question (1), consider the function 0 : H > G, o(a) = 2; that 
is, o is just the identity function on H. It is easy to see that o is a group 
homomorphism with image H. So the answer to our first question is Yes. 

The second question asks, is every subgroup a kernel of a group homo- 
morphism defined on its parent group? Let us consider a small example. Let 
G, = Ss, and let a= : ; : € Ss. Set H = (a). Then H is a group of 
order 2. Suppose that there exists a group homomorphism 0 : G; > Go, for 
some group G2, such that ker(o) = H. Then we have o(a) = e2, a(e1) = 2, 
and no other elements of G, are taken to eg by o. If 8 is any element of 
G1, then we must have o(a8) = o(a)o(G) = e2 - o(8) = o(f); similarly, 
a(Ba) = o(8)o(a) = o(B)- eg = 0 (8). In fact, using Proposition 7.4, we can 
see that all of the a’s “go away” when we apply a: for example, we have 


o(B°a8a*) = o(8°8) = o(8*). 
One of the simplest such formulas we can obtain in this manner is 
o(Ba8~*) = o(88~") = o(e1) = e2. 
Therefore, we have 


Bab! € ker(c). (7.10) 


Now let us take 6 = | 


Bago! = | : a @ {e1,0} = ker(o). (7.11) 


This contradiction proves that H cannot be the kernel of any group homo- 
morphism whose domain is $3. 

Thus, the answer to our second question is No. The discussion that led 
to our negative answer will also provide insight into the reason behind that 
answer. Equation 7.10 was the obstacle which prevented H from being a kernel. 
Motivated by this condition, we make two definitions: 


Definition 7.15. Let G be a group, and let a,y € G. We say that a and 7 
are conjugates of each other (in G) if there exists 6 € G such that y = BaB-!. 


64 Group Homomorphisms 


In this case we also say that y is the conjugate of a by 8. The reader should 
check that the relationship of being conjugates is indeed symmetric: a and + 
are conjugates iff 7 and a@ are conjugates. 


Definition 7.16. Let G be a group and let H < G. Then ZH is normal in G 
if 
Vae€ HVB EG, Bab EH. 


Notation 7.17. We write H <G to denote that H is normal in G. 


Remark 7.18. The definition of normal says that H is normal in G if H is 
a subgroup of G which is “closed under conjugation by arbitrary elements of 
G? 


Theorem 7.19. Let a: G, > G2 be a group homomorphism. Then ker(c) < 
G,. 


Proof. Let K = ker(o). From Theorem 7.13, we know that K < G,. It remains 
to show that K is closed under conjugation by arbitrary elements of G;. So 
let a € K and let 8 € Gy. [Show: Ba! € K; that is, show o(BaB~') = eg | 
We compute 

o(BaB-') = o B))~* (by Proposition 7.4) 
a(8))~1 (since a € K = ker(c)) 


This shows that Gaf—! € K, as desired. 


7.3. Cosets 


Being a subgroup was not enough to be a kernel; is it too much to hope that 
every normal subgroup is a kernel? Is there a converse to Theorem 7.19? Given 
a normal subgroup K <G, we would like to dream up a group H and a group 
homomorphism o : G > H such that ker(c) = K. 

Suppose that « € G, y € H, and o(x) = y. Our starting point will be to 
treat x as an unknown and solve this equation: that is, to investigate the pre- 
image of y under o, i.e. a 1({y}). Because we will write this so often, we will 
take the liberty of dropping the set braces and writing simply o~!(y); this does 
not imply that o has an inverse function! Well, what is o~'(y)? Suppose to 
start with that we have some solution z € a~+(y). Let us choose an arbitrary 
w €o'(y). Then our idea is that w and z are treated the same way by o; so 
w and z are “equivalent” according to a; thus z~!w should be equivalent to 
eg according to a, since a group homomorphism respects group relations. But 


Cosets 65 


a group homomorphism sends the identity element to the identity element, so 


we should have o(z~!w) = en. 
More precisely, we calculate o(z~!w) = o(z)~to(w) = y~'y = ex. Thus, 
we can say that z~!w € ker(o) = K. By setting k = as we can write 


w = zk. 
Conversely, it is not hard to check that any element of G of the form 
w = zk, where k € K, has the property that w € o~!(y). Thus we want to 
assert that 
o'(y) ={zk | k © K}. (7.12) 


Because of the natural importance of this type of set, we make a definition: 


Definition 7.20. Let T be a set with an associative binary operation (written 
as multiplication), and let S C T. Let z € T. Then 


2S :={zs|s eS} 


and 
Sz:={sz|s eS}. 


We call z5' a left coset of S in T and Sz a right coset of S in T. 


Remark 7.21. In the definition of coset above, we only required S to be a 
subset of the set T; in our previous discussion which motivated this definition, 
the place of S was taken by a normal subgroup of a group. In practice, the 
concept of a coset of S in T is most useful when T is a group and S$ is a 
subgroup of T’,, though not necessarily a normal subgroup. 


We summarize our discussion so far by stating the following lemma and 
corollary; the reader is asked to supply a complete proof of the lemma in 
Exercise 7.5. 


Lemma 7.22. Leto : G— H be a group homomorphism with kernel K. 
Then the pre-image of an element in the range of o is a coset of K in G. 
More precisely, let x € G and let y = o(x); then o'(y) = aK = Kz. 


Corollary 7.23. Ifo : G-— H is a group homomorphism with kernel K, 
then there is a bijective correspondence w from the set of all left cosets of K 
in G to the image of o, given by (aK) = o(x) for x inG. 


Proof. First, we check that w is well-defined. Suppose that a given coset C’ of 
kK in G can be written in two ways, as C = «K = yK, with z,y € G. | We 
must show that o(a#) = a(y).] Then we have eg € K,sox = a-eg € «K. Since 
uK = yK, we have x € yK, so x = yk for some k € K. Thus o(x) = a(yk) 
= o(y)o(k) = o(y)en = o(y), as required. 

Next, we show that w is injective. Suppose that u(C) = v(D) where C 
and D are left cosets of kK in G. Then we can write C = «Kk and D= wk 
for some z,w € G, by definition of left coset; and using the definition of w, 
we have o(z) = o(w) =: y. By Lemma 7.22, we have o~'(y) = 2K and also 
a '(y)=wK,so C=D. 


66 Group Homomorphisms 


Finally, we show that w is surjective. Let y € o(G). Then y = o(z) for 
some z € G. So y= (xk), and y is in the image of w. 


Before resuming our quest for a converse to Theorem 7.19, we establish 
some natural properties of the multiplication of group elements with cosets. 


Lemma 7.24. Let G be a group. Let SCG and z,y € G. Then 
(i) x(yS) = (ay)S, and 
(ti) x(Sy) = («S)y. 


Proof. We prove (i); the proof of (ii) is similar. Let T = {zyz : z © S}. We 
will show that T = x(yS). 

[ Show T C a(yS/) |] Let t € T. Then t = xyz for some z € S. Now yz € yS, 
so t= xyz = x(yz) € x(yS). 

[ Show a(yS) C T ] Let w € x(yS). Then w = xa for some a € yS, and 
a = yb for some b € S. So w = a(yb) = xyb eT. 

We have shown that T = a(yS); similarly, it can be shown that T = (xy)S. 
Thus 2(yS) = (ay)S. 


Lemma 7.22 says, in part, that the left and right cosets aK and Ka are the 
same, when K is the kernel of a group homomorphism. Since we are wondering 
whether every normal subgroup is also a kernel, it makes sense to ask whether 
aN and Na are equal when N is a normal subgroup; the affirmative answer, 
given in the following lemma, brings us another step closer to our goal: 


Lemma 7.25. If NIG andae G, thenaN = Na. 


Proof. Suppose that NG and a € G. [Show aN C Na] Let « € aN. Then 
x = ay for some y € N. [ Show x = za for some z € N } [ Then z = za™! | 
Let z = xa~!. Then z = aya, the conjugate of the element y € N by the 
element a € G. By definition of normal, we have z € N. Now x = ay = ayaa 
= za € Na. Therefore, aN C Na. Similarly, Na C aN, which concludes the 


proof. 


7.4 Quotient Groups 


Next, we shift our creativity into high gear to complete our program of con- 
structing a group homomorphism with prescribed domain and kernel. So sup- 
pose again that G is a group and N<G. Suppose that o : G—- H isa group 
homomorphism with kernel N, as desired. Observe that we are free to replace 
H by any subgroup L of H, as long as L contains the image o(G). Indeed, the 
codomain of a function is arbitrary except for the requirement that it must 
contain the image of the domain. We know from Theorem 7.13 that the image 


Quotient Groups 67 


a(G) itself is a subgroup of H. So we may take L = o(G) and consider o as 
a group homomorphism from G' to the group o(G), 


a: G>oa(G). 


Note that we have not changed the map o at all; we have merely changed 
our point of view about the codomain of ao. From this point of view, o is 
surjective. So our first insight is that we may as well look for a surjective 
group homomorphism o with domain G and kernel N. 

Thus, to further our discussion, we assume that ¢ : G — H is a surjective 
group homomorphism. Now, 0 may not be one-to-one, but the Corollary to 
Lemma 7.22 says that we do have a one-to-one correspondence between the 
cosets of N in G and the elements of H. To be specific, let us use left cosets. 
Then o carries each left coset xN (for « € G) to a single element o(x) of H. 
The left cosets of N in G mirror the elements of H. 

Now recall that we were given G and N, with N <G, and we had to find 
Hand o (if possible) such that o : G — H is a group homomorphism with 
kernel N. Our final idea in this program is quite radical. We have seen that 
if a suitable H and o exist, then we may assume that o is surjective, and 
then the elements of H must be in bijective correspondence, via o~', with 
the left cosets of N in G. Our idea is this: We will make a new group Q out 
of the left cosets of N in G, such that 0+ : H —> Q is a bijective group 
homomorphism. 

We proceed to deduce what the group law on these cosets must be. First, 
the reader should verify that the inverse of o~! isn’t actually o, but instead 
the function 7 : Q > H such that r(a#N) = o(z) for x in G. By Exercise 7.10, 
7 must be a group homomorphism. For zN,yN € Q, we have r(aN -yN) = 
T(aN)t(yN) = o(x)o(y) = o(xy) (since o is a homomorphism), = r((ay)N). 
Now 7 is bijective, so in particular 7 is injective, and we must have rN -yN = 
(xy) N. 

We have shown that, given groups N and G with N JG, if there is a group 
homomorphism with domain G and kernel N, then the set of all left cosets 
of N in G is itself a group, with multiplication defined in a very natural way, 
namely «N -yN = (xy)N. This motivates the following result. 


Theorem 7.26. Let G be a group and let N be a normal subgroup of G. Let 
Q={«eN : réEG}, 
the set of all left cosets of N in G. Define a binary operation - on Q by 
aN -yN = (ay)N. 


Then Q is a group, called the quotient group of G by N, and written G/N 
(pronounced “G mod N”). 


68 Group Homomorphisms 


Proof. First, we must show that - is well-defined on Q. Suppose that x1, y1, v2, 
y2 € G and that 41N = 22N and yi N = y2N. [| Show (a1y1)N = (xeoy2)N | 
Then (11y1)N = 21(yiN) = 21(yoN) = 21(Nyz2) (by Lemma 7.25) = (21 N) yo 
= (r2N))yo = (Na2)y2 = N(xay2) = (raye)N. 

Next, we verify the group axioms. 

G1: Let x,y,z € Q. Then we can write « = aN, y = ON, z = CN for 
some a,b,c € G. We have (xy)z = (aN -bN)-cN = (ab)N-cN = ((ab)c)N 
= (a(bc))N (by Axiom G1 for the group G) = aN - (bc)N = aN - (bN - cN) 
= x(yz). 

G2: We claim that N is an identity element for - on Q. Notice that N = 
ecgN, so N is a left coset of N in G, hence N € Q. Further, if a € G, then we 
have aN -egN = (aeg)N = aN, and similarly, egN- aN = aN. 

G3: We can see that a~|N is an inverse for aN, since a~!'N-aN = egN = 
aN -a7!N. 


Remark 7.27. Theorem 7.26 requires N to be a normal subgroup of G; the 
theorem is not true otherwise. The reader should examine the proof to see 
where the assumption of normality is used: it is only used to verify that the 
group law on Q is well-defined. 


Remark 7.28. We reiterate that an element of G/N is a set: more precisely, a 
left coset of N in G. The set N itself is the identity element of G/N. 


Example 7.29. Let n € Z. We claim that (n) J (Z,+). First, recall that (n) 
is just the set of all integer multiples of n, so that we can write (n) = nZ. To 
show that nZ < Z, we must show that for all a € Z and y € nZ, we have 
xyz + € nZ. Since we use additive notation in (Z,+), ryx~! really means 
x+y -—2, which is just y, and hence is in nZ, as desired. 

The elements of Z/nZ are the cosets of nZ in Z. They look like a + nZ 
where a € Z. This is exactly the congruence class of all integers congruent to 
a modulo n. 

As a special case, when n = 3, the elements of Z/nZ are: 


0+3Z ={...,-6,—3,0,3,6,...} 


(ESSA ..f 55) 52,194 Fark 
elie V Aan earees aeen Dee sR: eR 


These three cosets together contain all the integers. Any coset of (3) in Z 
can be realized as one of these three. For instance, 8+3Z contains 8+3-0 = 8, 
8+3--1=5, 8+3-1= 11, etc.; that is, all integers congruent to 8 modulo 3, 
which is the same as all integers congruent to 2 modulo 3. So 8+3Z = 2+32Z. 

In general, when n > 0, the order of Z/nZ is the number of congruence 
classes modulo n in Z, which is n: these congruence classes are 0-+nZ, 1+ nZ, 

.., (v1—1)+nZ. We also use the notation Z, for Z/nZ when n > 0. The 
notation Z,, is used primarily in elementary texts on abstract algebra to denote 
the quotient group Z/nZ. In the branch of mathematics known as number 


Exercises 69 


theory, the notation Z, (when p is a prime integer) denotes an entirely different 
object, the set of so-called “p-adic integers,” which we shall not study here. 

We are finally ready to prove a converse to Theorem 7.19. Given a group 
G and a subgroup N <1 G, the quotient group G/N provides a target for a 
group homomorphism with domain G and kernel N: 


Theorem 7.30. Let G be a group and let NG. Define a function o 
G > G/N bya aN. Then o is a surjective group homomorphism, and 
ker(o) = N. We call o the natural map from G to G/N. 


Proof. Suppose that N <1 G, and define o as above. [Show a is a homomor- 
phism ] Let a,b € G. Then o(a-b) = (a- b)N (by definition of c) = aN -bN 
(by definition of the group law on G/N) = o(a)-a(b). So o is a group 
homomorphism. 

[Show o is surjective ] Let x € G/N. Then x = aN for some a € G. So 
x = o(a). Thus, a is surjective. 

[Show ker(a) = N | 

[Show ker(a) C N ] Let a € ker(a). Then a € G and o(a) = egy, by 
definition of kernel. But egy = N, and o(a) = aN. Thus we have aN = N. 
Since N < G, we have eg € N, which gives a-eg € aN = N. Thus,ae€ N. 
We have shown that ker(a) C N. 

[Show N C ker(o) ] Let a € N. [Show a € ker(o); this means o(a) = 
egjn = N | Then o(a) = aN (by definition of a). [Show aN = N | Let 
x € aN. Then x = ab for some b € N. So x € N (by ST2). Conversely, if 
y € N, then y = aa~ty, and a~!y € N (by ST1 and ST2), so y € aN. Thus 
aN = N, so o(a) = N = egyn. Hence a € ker(c), and we have shown that 
N Cker(o). 

Therefore, ker(o) = N, which completes the proof. 


Corollary 7.31 ( kernel <— > normal ). Let G be a group and let S C G. 
Then the following are equivalent: 
(i) there exists a group homomorphism with domain G and kernel S; 


(i) SAG. 
Proof. Combine Theorem 7.19 and Theorem 7.30. O 


7.5 Exercises 
Exercise 7.1. Suppose that we tried to define a function 
6: Don > Fr({a, y}) 


like the function w of Equation 7.1, but in the opposite direction, sending a 
word on {F, R} to the corresponding word on {x,y}. What is wrong with such 
a definition? 


70 Group Homomorphisms 


Exercise 7.2. Consider a group (G,-). Create a function diagram with the 
property that G is a commutative group iff the diagram commutes. Hint: Use 
G and G x G at least once each, and find appropriate functions. 


Ezercise 7.3. Provide a detailed proof of Proposition 7.4 part (iii) by following 
the steps outlined in the proof given in the text. 


Exercise 7.4. While not strictly an exercise in abstract algebra, this problem 
explores the practical significance of the logarithm function stemming from the 
fact that it is a bijective group homomorphism. Let us consider three different 
methods for multiplying two real numbers where each number is known to d 
significant digits in base ten. 

(a) In the “Grade-School Multiplication” method, we multiply the first 
number by each digit of the second number, shifting to adjust for the place of 
the digits; then we add the resulting rows to get the final answer. Show that 
this method takes at least d? operations to complete. 

(b) In the “Exhaustive Pre-computation” method, we buy a book contain- 
ing the answer to every multiplication problem involving two d-digit base-ten 
numbers. Show that this book must contain 102% entries. 

(c) In the “Abstract Algebra” method, we buy a book containing the base- 
ten logarithm of every d-digit base-ten number. To compute a product x-y, we 
look up the logarithms of x and y, add them together, and finally we look up 
the number z whose logarithm is this sum. Show that our book only needs to 
contain 10% entries, and we only need to perform about d operations, although 
our answer will only be an approximation. (More accurately, each lookup takes 
linear time in d, and so does the addition.) Thus in the case d = 5, method 
(b) is already infeasible due to the size of the book, and method (c) is (naively 
speaking) already 5 times faster than method (a). 

Exercise 7.5. Prove Lemma 7.22. 


Exercise 7.6. Prove that if H < G and for all a € G we have aH = Ha, then 
HAG. (Note that this is a sort of converse to Lemma 7.25.) 
Exercise 7.7. Suppose that 0 : G , — G2 is a group homomorphism and 
H < Gj. Give a simple argument showing that o(H) < Go. 
Exercise 7.8. (a) Let 0 : G, > Gz be a surjective group homomorphism. 
Prove that if N < Gj, then o(N) J Go. 

(b) Show that the result of part (a) above fails in general if o is not 
surjective. 
Exercise 7.9. Suppose that 0 : G— H andt : H — L are group 
homomorphisms. Prove that +o is a group homomorphism, and that ker(z7 o 
a) 2 ker(c). 
Exercise 7.10. Prove that ifo : G—-— H isa bijective group homomorphism, 
then the function o~! : H — G is also a bijective group homomorphism. 
Exercise 7.11. Let G be the group of Exercise 4.10 part (e). Set 


n={| : | ; a,bER, not both oh. 
—b a 


Exercises 71 


Find an injective group homomorphism from C* to G whose image is H, 
where C™ is the group of all non-zero complex numbers under ordinary mul- 
tiplication. Why does this immediately imply that H < G? 


Exercise 7.12. Let o : G1 + Gg bea group homomorphism. Let H = o(G1) 
be the image of G; under o, and let K = ker(c). Prove that H is abelian iff 
for all a,b in G, we have aba~!b-! € K. 


Exercise 7.13. (a) For each cyclic subgroup of S3, compute (i) all of its right 
cosets in $3, and (ii) all of its left cosets in S3. 
(b) Which cyclic subgroups of $3 are normal in $3? 

Exercise 7.14. Construct group tables for the following groups: (a) Zs (b) Za 
(c) Zs 

Exercise 7.15. Let G be a group and let NG. Let C,D € G/N. Let C- D 
denote the product of C with D computed using the group law on G/N given 
in the statement of Theorem 7.26. Let Cx D denote the “setwise product” of C 
with D, namely C* D := {x-y :2€ C and y € D}. Prove that C-D = C*D. 


Exercise 7.16. Let G be a group and let H < G. Define the normalizer of H 
in G to be 
Ne(H) = {9 ¢G : gHg™* = H}. 


(a) Prove that Ne(H) < G. 

(b) Prove that H < No(H). 

(c) Prove that if LD < Gand HL, then L C No(#). Thus, Nc(#) is the 
largest subgroup of G in which H is normal. 


Exercise 7.17. Let G be a group, and let N IG. Set Q = G/N. Let 
S={H: H<Gand HON} 


and let 
T={L:L<Q}. 


(a) For H € S, define o(H) = {aN : x € H} CQ. Prove that for all 
H € S, we have o(H) €T. 

(b) For L € T, define 7(L) = {x EG : «N € L} CG, called the lift of L 
to G. Prove that 7 is a function from T to S. 

(c) Prove that o and 7 are inverse to each other. Thus, there is a natural 
one-to-one correspondence between subgroups of G/N and subgroups of G 
which contain NV. 


(d) Prove that 7(L) = U C={ay : c€G, yEN, andrNe L}. 
CEL 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


8 


Lagrange’s Theorem 


8.1 Cosets and Partitions 


The concepts and definitions which naturally arose when we studied group 
homomorphisms are so fruitful that we have yet to exhaust their potential. 
Among these is the concept of a coset, which we next examine in greater detail. 
Recall that we defined the left coset 7S for a subset S of a group G and an 
element x of G to be 

xS = {rs |s eS}. 


Remark 8.1. The discussion and results in this section deal mostly with left 
cosets, but our results apply equally well in the case of right cosets. In partic- 
ular, Theorem 8.7 and Lemma 8.8 are both true with left cosets replaced by 
right cosets. 


We were motivated to define the concept of a coset by the fact that, given 
a group homomorphism ¢ : G — H, the pre-image o~!(y) of an element 
y € o(G) isa left coset of the kernel K = ker(c). Specifically, we have 0~!(y) = 
«kK for any element x € G such that o(x) = y (Lemma 7.22). 

Now, if f : D > C is any function, from any set D to any set C, then 
the pre-images of the points of C always form a partition of D; that is, these 
pre-image sets cover all of D and are mutually disjoint. Because the concept 
of a partition is an important one which will occur again, we take a brief 
digression to develop these ideas; we start with a formal definition: 


Definition 8.2. Let S be a set. A partition of S is a set C consisting of 
subsets of S such that 
= [als 


Bec 


Here, the dot in the union symbol indicates a disjoint union; that is, a union 
of disjoint sets. 


One way to think about a partition C of a set S' is that the sets in C neatly 
carve S into separate regions or “classes.” We may say that two elements x and 
y of S are “related” to each other if they are in the same class; that is, if there 
exists B € C such that x,y € B. Going in the other direction, we will define 
the general notion of a “relation” on a set. The intention is to define what 
it means for two elements to be “related” to each other in some way. Given 


DOI: 10.1201/9781003252139-8 73 


74 Lagrange’s Theorem 


two elements x,y € S (possibly identical), we just need to know whether « is 
related to y; that is, we need to know whether this pair of elements satisfies 
our relation. Thus, in the extreme economy typical of mathematics, we define 
a relation on S' to be a set of ordered pairs from S: 


Definition 8.3. A relation on aset S is aset ~C S?. We usually use operator 
notation for a relation; hence, writing “a ~ b” means (a,b) €~. We read this 
as “a is related to b under ~.” 


If we try to form a partition of S from a given relation ~ on S, then we 
can run into problems; to start with, we need to be able to describe the classes 
carved out by such a partition. For a € S, we would like to define the “class 
of a” to be the set 

[fa] := {bE S : bwah. 


We would like to know when these sets [a] (for a € S$) will form a partition 
of S whose classes correspond to the sets [a]. For this to happen, we certainly 
want a to be in its own class, i.e. a € [a]; likewise, we would like two classes 
[a] and [b] never to “partially overlap”: that is, we should not have [a]M [b] 4 0 
unless [a] = [b]. To get a relation whose classes behave in such a nice way, 
we will impose extra properties on our relation. The reader should verify that 
the following three properties will be true for any relation defined from the 
classes of a partition: 


Definition 8.4. A relation ~ on a set S is called reflexive if Va € S,a ~ a; 
symmetric if Va,b€ S,a~ b = > b~a; and transitive if Va,b,c € S,(a~ b 
andb~c) = arc. 


A relation satisfying all three of these properties is singled out for special 
status: 


Definition 8.5. We say that a relation ~ is an equivalence relation if ~ is 
reflexive, symmetric, and transitive. If ~ is an equivalence relation on S$ and 
a € S, then the equivalence class of a (under ~) is the set [a] := {b € S$ 
b~ a}. 


It turns out that we now have enough to realize our hope and go backwards 
from a relation to a partition: 


Lemma 8.6. Suppose that ~ is an equivalence relation on a set S. Then the 
equivalence classes of S under ~ partition S. 


Proof. Exercise 8.3. 


Returning to our main discussion, we know from Exercise 8.4 that the 
pre-images of points in the image of a function f : D— C will partition the 
function’s domain: that is, we always have 


D=Vr'y. 


yEC 


The Size of Cosets 79 


It follows that if K is the kernel of some group homomorphism with domain 
G, then the left cosets of K in G should partition G. By Corollary 7.31, to be 
a kernel is the same thing as to be a normal subgroup. Thus, the left cosets of 
a normal subgroup must partition the group. The next result shows that we 
can remove the normality condition from the previous statement: 


Theorem 8.7. Let H < G. Then: 
(i) For all a,b € G, either aH = bH or aH bH = 9%. 
(ii) The left cosets of H in G partition G. 


Proof. (i) Let a,b € G. Suppose that aH N bH # 0. [Show aH = 6H | [Show 
aH C bH and bH C aH | Then Jz € aHN DH, so we can write x = ah; = bh2 
for some hy, ho € H. 
Let y € aH. Then y = ah for some h € H. Now a = bhoh;", soy = 
bhgh,'h. We have hghy'h € H by ST1 and ST2, so y € bH. Thus aH C bH. 
Similarly, we can show that bH C aH. So aH = bH, as desired. 
(ii) Formally, we must show that 


G=U«, 


Cel 


where £ is the set of all left cosets of H in G. 
Let  € G. Then + = w-e € «H (since e € H). This shows that G C 
Uces ©: 
To show that Ugee C © G, we simply observe that each left coset C of H 
in G is a subset of G. (Indeed, everything in our discussion is inside of G!) 
By part (i), the union is disjoint. This completes the proof of (ii). 


Da 


8.2 The Size of Cosets 


The next result implies that cosets of the same set (within a group) always 
have the same size, even if that set is not a subgroup: 


Lemma 8.8. Let G be a group and let SCG. Let ae G. Then the function 
f : S—> aS given by s + as ts a bijection. Thus all left cosets of S in G 
have the same size as S, and hence the same size as each other. 


Proof. [Show f is injective | Let x,y € S, and suppose that f(x) = f(y). 
Then ax = ay, so x = y by Lemma 3.46. 
[Show f is surjective ] Let z € aS. Then z = as for some s € S, and 


f(s) =z. 


Definition 8.9. Let G be a group, let H < G, and let C be a coset of A in 
G. An element of C is called a representative of C, or a coset representative 
for C. 


76 Lagrange’s Theorem 


Remark 8.10. Suppose that H < Gandae€e G. Let C = aH. Since e € H, we 
always have a = ae € aH, so a is a coset representative for C’. Also, if b € aH, 
then likewise b = be € bH, so we must have bH = aH by Theorem 8.7. Thus, 
any coset representative b for C' has the property that C = bH. 


Warning 8.11. Cancellation does not work for coset representatives: aH = bH 
does not imply a = b in general. This is clear from Remark 8.10, since we have 
aH = bH for every b € aH! 


Definition 8.12. Let H < G. The index of H in G is the number of left 
cosets of H in G. 


Notation 8.13. The index of H in G is denoted [G: H]. 


Remark 8.14. The index of a subgroup in a group is either a positive integer, 
or else it is infinite. A finite group can only have subgroups of finite index, but 
an infinite group can have subgroups of finite or infinite index. The bigger a 
subgroup is, the smaller the index of that subgroup will be. At one extreme, 
the index of H in G is one if and only if H = G; this is Exercise 8.1. 


Definition 8.15. Let G be a group and let H < G. A complete set of (left) 
coset representatives for H in G is a set R C G such that every left coset of 
HT in G contains exactly one element of R. 


Remark 8.16. We can form a complete set of coset representatives for H in 
G simply by choosing one element from every left coset of H in G; this must 
work by Theorem 8.7. (The innocuous-seeming statement that it is possible 
in general to simultaneously choose an element from each left coset of H in G 
is actually equivalent to the Axiom of Choice. We shall only do this, however, 
in cases where the index of H in G is finite, so we do not need to worry about 
this axiom here.) Thus, the size of any complete set of coset representatives 
for H in G is equal to the index of H in G. 


We have remarked that results which are true for left cosets are generally 
also true for right cosets. Usually it makes no difference whether we work with 
left as opposed to right cosets. At this point, however, it is reasonable to ask 
whether we always get the same result by using right cosets instead of left 
cosets in the definition of the index of a subgroup in a group. The following 
result implies that we do. We note that this result follows by a simple counting 
argument in case the group in question is finite, but our result is not limited 
to finite groups. 


Lemma 8.17. If G is a group and H < G, then the number of left cosets of 
H in G equals the number of right cosets of H in G. 


Proof. Let £2 and % denote the sets of all left cosets and right cosets of H in G, 
respectively. Let f : G-— G be the inverse function, defined by f(x) = x71. 
Then by Exercise 8.6 (a), f induces two functions 0 : £— ®, o(L) = f(L) 
andr : R- L, 7(R) = f(R). By Exercise 8.6 (b), f is its own inverse, so o 
and 7 are inverses of each other, hence bijective. This completes the proof. 


Reaping the Consequences 77 


8.3. Reaping the Consequences 


Our previous work has rich implications for the structure of finite groups. 
Many of these implications follow immediately from the next result. 


Theorem 8.18 (Lagrange’s Theorem). Let G be a finite group. If H < G, 
then |H|- |G: H] =|G|, and, in particular, |H| divides |G\. 


Proof. Suppose that G is a finite group and H < G. By Theorem 8.7 part (ii), 


we have G = J C’, where £ is the set of all left cosets of H in G. Therefore, 


Ces 
IG| = Yocee |C|. By Lemma 8.8, we have |C| = || for all C in £. It follows 


that |G| = |H|-|£]. The result now follows from the definition of [G : H]. 


Corollary 8.19. If G is a finite group and N IG, then |G/N| = |G|/|N]. 


Proof. By the definitions of G/N (in Theorem 7.26) and [G : N] (in Definition 
8.12 and Notation 8.13), we have |G/N| = [G: N]. The result now follows 
from Lagrange’s Theorem. 


Lagrange’s Theorem provides one of the most basic and important tools 
in the study of finite groups. As an example of its power, we next show that 
any finite group of prime order has a very special form. 


Theorem 8.20. Every group of prime order is cyclic. 


Proof. Suppose that G is a group of order p, where p is prime. Then |G| > 1, 
so da € G— {e}. Let H = (x) and let n = |H|. By Lagrange’s Theorem, we 
have n | p. Also, we have n > 1, since « 4 e and both e and z belong to H. 
Therefore, n = p = |G|. Since H C G,, it follows that H = G. Thus (x) = G, 
so G is cyclic. 


Another corollary of Lagrange’s theorem involves cyclic subgroups of a 
finite group. Before presenting this result, we make a standard definition: 


Definition 8.21. Let G be a group and let « € G. The order of x is the order 
of the cyclic subgroup generated by z. 


Notation 8.22. We denote the order of x by |z|. 


Remark 8.23. By definition of the order of x, we have |x| = |(x)|. Just as 
with the index of a subgroup, the order of a group element may be a positive 
integer or it may be infinite. 


Here finally is the promised result: 


Lemma 8.24. Let G be a finite group, and let x € G. Then 
|| =min{n € Z|n>0 and a” =e}, 


and |x| divides |G]. 


78 Lagrange’s Theorem 


Proof. Since G is finite, then by Exercise 5.8, there is a positive integer k 
such that «* = e. Now the formula for |:| follows from Exercise 5.7 and the 
definition of |x|. That |z| divides |G| follows from the definition of |x| and 
Lagrange’s Theorem. 


Corollary 8.25. If G is a finite group and x € G, then a! = e. 


Proof. By Lemma 8.24, we can write |G| = |2|-k for some k € Z. Thus, 


Example 8.26. Suppose that G is a group of order 7. Since 7 is prime, Theorem 
8.20 implies that G = (x) for some x € G. So we must have |z| = |G| = 
7, by definition of |x|. Now by Lemma 8.24, we have 2” = e. Thus, G = 
{e,x,27,x?,x*,2°,2°} by Lemma 5.6. The reader can check that to multiply 
two elements of G written as powers of x, we add the powers modulo 7: 
x? - a> = x°, where c= a+b (mod 7). Thus, knowing only that a group has 
order 7, we have been able to essentially write down a formula for the group 
law! The main result behind this analysis is Lagrange’s Theorem. 


8.4 Exercises 


Exercise 8.1. Let G be a group and let H < G. Prove that [G : H] = 1 iff 
H =G. (Prove the general case; do not assume that G is finite!) 

Exercise 8.2. Let G be a group, and let H < G. Let a,b € G. Prove that 
aH = 0H iffatbe H. 

Exercise 8.3. Prove Lemma 8.6. 


Exercise 8.4. Let f : D—C be a function. Define a relation ~ on D by 
a~ biff f(a) = f(b). 

(a) Prove that ~ is an equivalence relation on D. 

(b) Prove that the equivalence classes of D under ~ are the pre-image sets 
f-*(y) for y € f(D). 
Exercise 8.5. Let ~ be an equivalence relation on a set S. Prove that there 
exists a function f such that ~ is the relation obtained from f via the con- 
struction of Exercise 8.4. 


Exercise 8.6. Let G be a group, and suppose H < G. Define a function 
f : GoGby f(z) =27! forzeG. 

(a) Prove that if C is a left coset of H in G, then f(C) is a right coset of 
HT in G, and vice versa. 

(b) Prove that f o f = ide, the identity function on G. 


Exercise 8.7. Suppose that H < G and [G: H] = 2. Prove that HG. 


Exercises 79 


Exercise 8.8. Suppose that we tried to prove Lemma 8.17 by defining a func- 
tion ao : £— RK using the formula o(aH) = Ha for a € G, with the same 
notation as in the proof of that lemma. 

(a) Find an example of a group G and subgroup H < G such that o is not 
well-defined. Hint: take G = S3. 

(b) Prove that in general o is well-defined iff H <G. 


Ezercise 8.9. (a) Why must every subgroup of $3 have order 1, 2, 3, or 6? 
(b) Let a= | € Ss, and let H = (a). Prove that H < Ss. 


: 1 2 3 
Exercise 8.10. Let 6 = | 9 43 
(a) Prove that L @ S3. 

(b) Without doing any more “real” work, deduce what the normalizer of 


L in S3 must be (see Exercise 7.16). 


€ $3, and let L = (8). 


Exercise 8.11. Let G be a finite cyclic group of order n, and let a € G bea 
generator of G. Suppose that H < G, and set E={jeEZ: j7>1land@e 
A}. Let r = |A|. 

(a) In Exercise 5.6, assuming r > 1 it was shown that EF is non-empty and 
that H = (a™), where m = min(£). Prove this is also true under our present 
hypotheses even if r = 1. 

(b) Why must r divide n? 

(c) Prove that n divides mr. (Hint: see Exercise 5.7.) 

(d) Let q = n/r, and set H = (a?). Show that q | m, and deduce that 
a™ € H, hence H < H. 

(e) Prove that |a%| = r, and deduce that H = H. Thus, (a”/") is the unique 
subgroup of G of order r. 


Exercise 8.12. Once upon a time, a mathematician attempted to recall the 
definition of the normalizer of a subgroup H in a group G, but made a mistake, 
and instead defined 


Bo(H) ={9€G : gHg* C Hy}. 


(Compare this to the correct definition of the normalizer Ng(H) in Exercise 
7.16.) In this exercise, you will show that in fact Be(H) is not always a 
subgroup of G, much less a normalizer subgroup. The idea is to find a group 
G containing two elements x and y such that 


yxy") = 27, (8.1) 


so that y will be in Be((x)). Then we will take H = (x). Equation 8.1 seems 
to say that conjugating x by y will double the power of x, and so conjugating 
x by y+ should cut the power of x in half. But if the group H does not 
contain a “square root” of x, then y~! will not lie in Bc(H)! 
(a) Suppose that xz, y € G and Equation 8.1 holds. Prove by induction that 
we have 
yaaa y (8.2) 


80 Lagrange’s Theorem 


for every non-negative integer k. 
(b) Making use of Equation 8.2, prove that we have 


yea? = ge?’ yb (8.3) 


for all integers a,b with b > 0. 
(c) Use Equation 8.3 to show that we have 


(2*y?) (ety?) = aote? y+? (8.4) 


for all integers a,b,a,@ with b > 0. 
(d) Motivated by Equation 8.3, define 


G=QxZ 
with binary operation - given by 
(a,b) - (a, B) = (a+a-2?,b+ 8). 


Prove that (G,-) is a group. 

(e) With G as in part (d) above, set « = (1,0) and y = (0,1). Let H = (2). 
Prove that H = {(n,0) : ne Z}. 

(f) Using the notation of part (e) above, prove that y € Bg(H) but y~! ¢ 
Bo(#). Conclude that Be(H) is not a group. 

(g) Prove that if G is any finite group, and H < G, then Bg(H) = Ne(#), 
and thus, in particular, Be(H) < G. Hint: Lemma 8.8 and Remark 8.1 can 
be used here. 


9 


Special Types of Homomorphisms 


9.1 Isomorphisms 


Consider the two groups 
Z3 = {0+ 3Z,1+3Z,2+43Z} 
and 
c=(| 3°12 ) < S3. 


For convenience, let us use the notation 0 = 0+ 3Z, 1 =1+3Z,2=2+432Z, 
and set a = | ; : : h Let 8 = a?. Then we have Z3 = {0,1,2} and 


G = {a, 6,e}. Constructing group tables for each group, we find: 


Z3 : 


WR afo 
WR ar]o 
ae WLR 
Rae Bo HoNwDy 


First of all, each group has order 3. More than this, the patterns in the 
two tables are identical: only the labels of the elements are different. If we 
were to re-label 0 as e, 1 as a, and 2 as #, then the group table for Z3 would 
be the same as the group table for G. Except for how we happened to label the 
elements, the two groups are identical. 

Let us find a precise mathematical statement which captures this notion of 
two groups (G1, -1) and (G2, -2) being the same except for how their elements 
are labeled. First, we need to formalize the notion of “re-labeling” the elements 
of G; to match those of Gg. Actually, all we need to do is to associate each 
element of G, with a unique element of G2 and vice versa, and this can be 
accomplished by giving a bijection 


ao: Gi 7 Go. 


DOI: 10.1201/9781003252139-9 81 


82 Special Types of Homomorphisms 


In our example above, we would take o(0) = e, a(1) = a, and o(2) = 6. With 
this identification of the elements of the two groups, what does it mean to 
say that the group tables are identical? Informally, in our example above, no- 
tice that we have “o(Table for G;) = Table for Gp.” More formally, suppose 
that the table for G, contains a row labeled by the element x and a column 
labeled by y: 


There is some element z of G; in row x and column y. Meanwhile, the corre- 
sponding row and column in the table for G2 are labeled by some elements a 
and b of Ge, and the corresponding entry is some element c of Go: 


Now, to say that the row for a corresponds to the row for x just means that 
a(x) =a. Similarly, o(y) = b. To say that the group tables are identical after 
re-labeling the elements of G; via o just means that, in addition, o(z) = c. 
But we also have x-; y = z and a-2 b = c, because of how a group table 
is constructed. Therefore, o(a -1 y) = o(z) = ¢c = @-2b = a(x) -2 a(y). 
This equation exactly says that o is a group homomorphism! (We rely on the 
fact that # and y are arbitrary elements of Gj.) In fact, the only difference 
between our requirements for 0 and our requirements for a general group 
homomorphism from G, to G2 is that we also required o to be a bijection. 


Definition 9.1. Let G; and G2 be groups. An isomorphism from G, to G2 
is a bijective group homomorphism from G’; to G3. 


Notation 9.2. We write G, = Gp if there exists an isomorphism from G to 
Gp, and we say in this case that G, is isomorphic to G2, or that G, and G2 
are isomorphic (to each other). 


Remark 9.3. Again, we interpret the condition Gj = G2 as saying that G1 
and Gp» are identical as groups, except for how their elements happen to be 
labeled. An isomorphism 0 : G , — Gp tells us how to identify elements of 
G, with elements of G2 so that the group laws on G and G2 are the same. 


Remark 9.4. Since an isomorphism is bijective, a necessary condition for two 
groups to be isomorphic is that they have the same order. However, this 
condition is not sufficient: see Exercise 9.2. 


Isomorphism expresses the idea of “sameness” in a way that renders the 
labeling of group elements irrelevant. The following result says that this notion 
of sameness enjoys three properties that any good notion of sameness should 


Isomorphisms 83 


have, namely, those properties which make isomorphism into an equivalence 
relation (see Definition 8.5). 


Lemma 9.5. Let G1, Go, G3 be groups. Then: 
(1) id, : Gi > G, is an isomorphism, where id, is the identity map on 
Gy 4 


: Gy > Gy is an 


(2) Ifo : Gy, > Gs is an isomorphism, then o— 
isomorphism; 
(3) Ifo : Gy > G2 andt : Gg > Gs are isomorphisms, then T oo 
G, > G3 is an isomorphism. 
In particular, we have: 
Gy = G); 
Gi 2=Go => G2 Gi; 
G, = Go and Go = G3 = G, =G3. 


Proof. (1) Left to the reader as Exercise 9.1. 
(2) This is the content of Exercise 7.10. 
(3) This follows from Exercise 7.9 and the fact that the composition of two 

bijections is a bijection. 


Remark 9.6. The equivalence class of a group according to the relation of iso- 
morphism is known as the isomorphism class of the group. When we speak of 
identifying a group up to isomorphism, we mean identifying the isomorphism 
class of the group. Often, we accomplish this task by finding a particular 
“known” group which is isomorphic to the group in question. 


In Theorem 7.30, we saw that there is a special “natural” group homomor- 
phism from a group to a quotient of that group. Now that we understand the 
concept of an isomorphism of groups, we can revisit the relationship between 
group homomorphisms and quotient groups, and prove that the image of an 
arbitrary group homomorphism “is” (is isomorphic to) a quotient group. 


Theorem 9.7 (The Fundamental Theorem of Group Homomorphisms). If 
a : Gy > G2 is any group homomorphism, then o(G,) = G/ker(c). More 
specifically, the function rT : G1/K — o(G) given by the formula r(ak) = 
a(a) fora € G, is a well-defined group isomorphism, where K = ker(o). 


Proof. Suppose that 0 : G, — Gp» is a group homomorphism, and set 
K = ker(c), H = o(G,). We attempt to define a function T : Gi/K > H 
by the formula ak ++ o(a) for a € G;. Because a coset representative a is not 
uniquely determined by the coset it’s in, we must check that 7 is well-defined. 
So suppose that ak = bK where a,b € G,. Then a € bK (why?), so a = br 
for some x € K. Therefore, o(a) = o(bx) = o(b)o(x) (since o is a group 
homomorphism) = o(b) (since x € ker(c)). This shows that our formula for T 
gives the same result whether we use a or b as a coset representative. So 7 is 
well-defined. 


84 Special Types of Homomorphisms 


We next verify that 7 is a group homomorphism. For two cosets cK and 
dK of K in Gi, we compute t(cK -dK) = r((cd)K) = o(cd) = o(c)o(d) (since 
go is a group homomorphism) = 7(ck)r(dK). 

Finally, we check that 7 is bijective. 

[Show 7 is surjective ] Let h € H. Then h = o(a) for some a € Gi, by 
definition of H as o(G1). So r(ak) = o(a) = h. Thus, 7 is surjective. 

[Show 7 is injective ] Suppose that C,D € G1 /K and r(C) = 7(D). Then 
we can write C = ak and D = bK for some a,b € Gi. So r(aKk) = 7(bK). 
[We must show that ak = bK ] Then o(a) = t(aK) = 7(bK) = a(b). Now 
a(a) = 0(b) =: c € Gg. By Lemma 7.22, we have o~!(c) = aK = bK. So rT is 
injective. 

Now we know that 7 is an isomorphism from G)/K to H, so G;/ker(c) = 
a(G,). By Lemma 9.5, we can also write 0(G1) = G;/ker(c). 


Remark 9.8. We refer to 7 in the proof above as the homomorphism induced 
by og on Gi/K. 


Remark 9.9. One way to view Theorem 9.7 is that an arbitrary group homo- 
morphism 
Gy > G2 


“factors” as a composition of two group homomorphisms 
G1 % Gi/ker(c) 4 Go, 


where v is the natural map, which is surjective, and o is the map induced by a 
on G';/ ker(c), which is injective. By saying that o “factors,” we mean that o = 
oov. Just as integers can factor under the binary operation of multiplication, 
functions—in this case, group homomorphisms—can factor under the binary 
operation of composition! 

A typical use of the isomorphism concept is to try to understand some 
group under study by showing that it is isomorphic to a well-understood 
group. The following results, culminating in Corollary 9.12, illustrate one such 
situation. 


Lemma 9.10. Fr({x}) = (Z, +). 


Proof. Let F = Fr({x}). Then F = {29 : j € Z}. Defineo : FOZ 
by 27 +> j for each 7 in Z. We verify that o is a group homomorphism: for 
all j,k € Z, we have o(2)- x*) = o(a**) = 7 +k = a(x’) + o(a*). It is 
straightforward to verify that o is bijective. 


Theorem 9.11. Every cyclic group is isomorphic to a quotient of (Z,+). 


Proof. Let G be a cyclic group. Then G = (a) for some a € G. Let 
F = Fr({a}). By Example 7.14, with T = {a}, we have a surjective group 
homomorphism 7 : F + G with r(x) = a. By Lemma 9.10, we have F & Z; 


Automorphisms 85 


so by Lemma 9.5, we also have Z = F via some isomorphism a. Thus we have 
a sequence of two surjective group homomorphisms 


Z>F SG. 
Let h=7 00. Then fh is surjective since both 7 and a are surjective; and by 
Exercise 7.9,h : Z— Gis a group homomorphism. So by Theorem 9.7, we 


have G & Z/ker(h). Thus, G is isomorphic to a quotient of Z under addition, 
as required. 


Corollary 9.12. Every cyclic group is isomorphic to either (Z,+) or to Zn 
for some positive integer n. 


Proof. Let G be a cyclic group. Then by Theorem 9.11, G = Z/K for some 
normal subgroup K <1 (Z,+). Now (Z,+) is cyclic (generated by 1, for exam- 
ple). So by Exercise 5.6, we know that K is cyclic; so K = (n) = nZ for some 
n € Z. Since nZ = (—n)Z, we may assume that n > 0. Now if n > 0, then 
G2Z/nZ = Z,, by definition of Z,; while if n = 0, then G = Z/{0} = Z by 
Exercise 9.1. 


Corollary 9.12 may be viewed as a “classification theorem”: it identifies 
each cyclic group as a well-understood group (either Z or Z,,), up to iso- 
morphism. For each possible order 1, 2, 3, ..., or oo, there is exactly one 
corresponding isomorphism class of cyclic group (compare Remark 9.4). 
Example 9.13. Let G be a group of order 7; as in Example 8.26, we know that 
G must be cyclic. Using Corollary 9.12, we can now say that G = Z7. Thus, 
there is only one group of order 7, up to isomorphism! By contrast, we could 
try to classify all the groups of order 7 by starting with a set with 7 elements 
and writing out every possible 7 x 7 binary operation table. The number of 
distinct binary operations on a set of order 7 is equal to 


79 = 256923577521058878088611477224235621321607. 


After writing out these tables, we would proceed to eliminate those which fail 
to satisfy at least one of the group axioms. Finally, we would identify any two 
group tables which represent isomorphic groups. That would leave us with 
exactly one table, equivalent to the table for Z7. 


9.2 Automorphisms 


Consider again the group table for Z3, using the notation of Section 9.1. We 
had 


Z3: 


DIF! Ol} 

ol el Re) Be) 
OI1rNi rl] FR! 
rR! Ol bo} wl 


86 Special Types of Homomorphisms 


Now consider the result of swapping 1 and 2 everywhere in this table. We get 
a new table: 


Qs Da 
Ok 22 SL 
B32. “1: 
fell er <2 


The pattern of labels in the second table is identical to the pattern of labels in 
the first table. This leads us to the conclusion that the function ag : Z3 > Zs 
given by 0+ 0, 115 2, 2+4 1 is an isomorphism from Z3 to Zs. 

From another point of view, we can construct the second table from the 
first by swapping the rows labeled by 1 and 2, and also swapping the columns 
labeled by 1 and 2. Even though the second table itself is different from the 
first table, there is some sense in which the swaps have no effect: namely, if we 
construct a binary operation from the second table, it will be identical to the 
binary operation given by the first table. This suggests that we are dealing 
with a kind of symmetry here. 

Indeed, we can formally acknowledge our swap to be a symmetry according 
to Definition 5.1. Namely, let (for convenience) (G,-) = (Z3,+), let S = G? = 
Gx Gx G, and let X = {(a, b, a-b) | a,b € G}. Then the isomorphism o 
induces a permutation o of S via the formula 


a : (a,b,c) + (o(a),a(b), o(c)), (9.1) 


and ¢ is a symmetry of X with respect to S (you are asked to prove this in 
Exercise 9.3). Now X just corresponds to the group table for G, so we may 
view o as asymmetry of the group table. This entire discussion generalizes to 
the situation when we have an isomorphism from any group G to itself, and 
it is important enough to be given a name: 


Definition 9.14. Let G bea group. An automorphism of G is an isomorphism 
from G to G. 


Remark 9.15. An automorphism of G corresponds to a symmetry of the group 
G: a permutation of the elements of G which leaves the group law for G 
unchanged. Just as a group element can represent a symmetry of an object, 
an automorphism represents a symmetry of a group! This suggests that the 
set of all automorphisms of a given group may itself be a group, and this is in 
fact the case (see Exercise 9.8). 


9.3. Embeddings 


We have seen that when we require a group homomorphism to be bijective, 
then it acquires a special status, that of an isomorphism, which guarantees us 


Exercises 87 


that the domain and codomain have the same structure as groups, although 
the elements are perhaps labeled differently. Yet the concept of “codomain,” 
as we have observed, is somewhat artificial: it is not really intrinsic to a func- 
tion. For this reason, of the two conditions, injectivity and surjectivity, which 
combine to make bijectivity, the condition of injectivity is the more essential. 


Definition 9.16. An embedding is an injective homomorphism. 


Notation 9.17. We write o : H © G to indicate that o is an embedding 
of H into G. In this situation, we say that H embeds in G or that H can be 
embedded in G. Sometimes we do not name our embedding, but write merely 
H > G to mean that H embeds in G. 


The significance of this notion is that when we consider an embedding 
ao : HG, wecan let L = o(H), so that o gives a bijective homomorphism— 
an isomorphism—from H to L. Note that L < G by Theorem 7.13. Thus, to 
say that H embeds in G is to say that H is isomorphic to a subgroup of G. 

We have noted that the kernel of a group homomorphism measures how 
much information is lost in passing from the domain to the image. One mani- 
festation of this principle is the following lemma, which represents the extreme 
case when a kernel is as small as possible: namely, the case of an embedding, 
in which no information is lost. 


Lemma 9.18. A group homomorphism is an embedding iff its kernel is the 
trivial subgroup, {e}. 


Proof. Exercise 9.14. 


Earlier, we remarked that “the fundamental way (some might say the only 
way) that two groups can be related is via a group homomorphism from one to 
the other.” To those who objected that another way two groups can be related 
is that one group could be a subgroup of another, we can now respond as 
follows: The subgroup relation is essentially the relation of one group embed- 
ding in another group, and is thus a special kind of homomorphism relation. 
More precisely, one group is a subgroup of a second group if and only if their 
underlying sets have a subset relationship which gives rise to a group homo- 
morphism (see Exercise 9.4). 


9.4 Exercises 


Exercise 9.1. Let G be a group. 
(a) Prove that the identity map from G to itself is an isomorphism. 
(b) Let H = {e}. Prove that HIG and that G/H =G. 


88 Special Types of Homomorphisms 


Exercise 9.2. (a) Pat thinks that two groups G and H are not isomorphic iff 
do: G— 4H such that o is not an isomorphism. Is Pat correct? 
(b) Find, with proof, two groups of order 4 which are not isomorphic. 


Exercise 9.3. Prove that the map 6 defined by Equation 9.1 is indeed a sym- 
metry of X with respect to S (see the context of this equation for definitions 
of X and S). Make your proof general: only use the fact that o is an auto- 
morphism of a group G; do not assume that G = Zs. 

Exercise 9.4. Let (G,-) be a group, and let S C G. Suppose that (S,A) 
forms a group. Let a0 : G — G be the identity function on G, and let 
7 : S—+G be the restriction of o to S. Prove that (S,A) < (G,-) iff risa 
group homomorphism from (5, A) to (G,-). 

Exercise 9.5. Prove that if n and m are positive integers with n < m, then 
Sn 2 Sim (see Notation 3.30). 

Ezercise 9.6. Let S and T be two sets such that |S| = |T|; recall that this 
means there is a bijection from S to T. Prove that Sym(S) & Sym(T). (You 
should not assume that S and T are finite.) 

Exercise 9.7. Use group tables for the groups Dg and $3 to construct an 
isomorphism between these groups. 

Exercise 9.8. Let G be a group. Define Aut(G) to be the set of all automor- 
phisms of G, equipped with the binary operation given by composition of 
functions. Prove that Aut(G) < Sym(G). In particular, Aut(G) is a group. 
Exercise 9.9. Let G be a group and let g € G. Define a function ¢, : G+G 
by the formula ¢,(x) = gxg~' for x € G. Prove that ¢, is an automorphism 
of G. We call ¢, the inner automorphism determined by g, or the conjugation 
by g map. 

Exercise 9.10. Let G be a group. Let 


Inn(@) = {¢y : 9 € Gh. 


the set of all inner automorphisms of G. Prove that Inn(G) < Aut(G). (Refer 
to Exercises 9.8 and 9.9.) 


Exercise 9.11. Let G be a group. Define 
Z(G)={reG: WEG, gr=2g}. 


Z(G) is called the center of G. 

(a) Prove that Z(G) JG. 

(b) Generalize part (a) by proving that for all H < Z(G), we have H IG. 

(c) Complete the following statement: G is abelian iff Z(G) = ___ 
Exercise 9.12. Let G be a group and let Z(G) be the center of G. Define a 
function f : G— Inn(G) by the formula g + yg. (Refer to Exercises 9.10 
and 9.11.) 

(a) Prove that f is a group homomorphism. 

(b) Prove that Inn(G) = G/Z(G). 


Exercises 89 


Exercise 9.13. This exercise refers to Exercise 4.10. Let H= {AES : |A]= 
1}. Prove that HIG and G/H = R*. 


Exercise 9.14. Prove Lemma 9.18. 


Exercise 9.15. Let 0 : G— H bea group homomorphism, and set K = 
ker(o). Suppose that NI Gand NC K. 

(a) Prove that for a € G, the formula o(aN) = o(a) gives a well-defined 
group homomorphism ¢ : G/N > H. 

(b) Prove that we can write 0 = Gov wherev : G—> G/N is the natural 
map. We say that o factors through G/N. 


Exercise 9.16 (Automorphisms of Z,,). Let n be a positive integer, and let 
G=Z,. Lett u=1+nZeG. Let ae Z, and sett a=a+nZeG. 

(a) Suppose that o € Aut(G) and o(u) = a. Prove that we have o(ku) = 
o(k+nZ) = ka = (ka) + nZ for each k € Z. Conclude that there is at most 
one automorphism of G which sends u to a. 

(b) Prove that the function f : G— G given by the formula f(k+7Z) = 
(ka) + nZ is a well-defined group homomorphism. 

(c) Prove that the map f from part (b) above is an automorphism of G 

if and only if gcd(a,n) = 1. Hint: Since Z,, is finite, f is bijective iff f is an 
embedding iff ker(f) = {0 + nZ}, by Lemma 9.18. 
Exercise 9.17. One day in the middle of an abstract algebra lecture, the in- 
structor pointed out that the definition of the quotient group G/N apparently 
never relied on the properties of N as a group, but only on the fact that N is 
normal, that is, closed under conjugation. Could this be true? And if so, does 
it give us a whole new world of groups to explore? This exercise investigates 
these questions. 

Let G be a group. For a subset S C G, let us say that S is closed under 
conjugation (by elements of G) if Vz € S Vg € G, gxg~' € S. (Note that we 
do not require S' to be a subgroup of G.) Let € denote the set of all left cosets 
of S in G, and suppose that S is closed under conjugation by elements of G. 

(a) Prove Lemma 7.25 with S in place of N: that is, prove that for all 
a€éG, we have aS = Sa. 

(b) Consider the formula (xS)- (yS) = (ay)S for x,y € G. Prove that this 
formula gives a well-defined group law on €. 

(c) Define a function o : G— € by the formula o(a) = xS. Prove that o 
is a surjective group homomorphism. 

(d) Use part (c) above to show that € is isomorphic to an ordinary quotient 
group of G. Thus our construction does not “really” yield any new groups. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


10 
Making Groups 


10.1 Introduction 


One way of producing groups is by starting with a group, taking a subset of 
the group, and generating a subgroup from this subset. We have seen (Theo- 
rem 4.12) that there is always a unique smallest subgroup containing a given 
subset. In general, looking for subgroups within a given group is a promising 
way to find “new” groups. 

Another way to produce groups is to start with a group, find a normal 
subgroup of the group, and then form the quotient group. 

In this chapter, we explore each of these two ideas in turn. 


10.2. A Quotient Engine 


In order to produce normal subgroups on demand, it will be helpful to have 
some results on generating normal subgroups similar to those we obtained for 
ordinary subgroups. We start with the counterpart for normal subgroups of 
Lemma 4.20. 


Lemma 10.1. Let G be a group and let € be a non-empty collection of normal 
subgroups of G. ThenNyecH IG. 


Proof. Set I = NyecH. By Lemma 4.20, we have I < G. Let x € G and let 
y € I. [Show xyx~! € I | [This means VH € €,xyx~! € H | Let H € €. Then 
y € H, by definition of I. Since HG and x € G, we have zyx! € H. But 
since H was an arbitrary member of €, we have eyx~! € NyeeH = I. This 
shows that JG. 


We can use the preceding result to construct normal subgroups with a 
“top-down” approach: 


DOI: 10.1201/9781003252139-10 91 


92 Making Groups 


Definition 10.2. Let G be a group and let S C G. The normal subgroup of 
G generated by S is 
((S))= [| N. 

NAG 

SCN 
Remark 10.3. Since we may take N = G, we are intersecting a non-empty 
collection here. We therefore have ((.S))< G by Lemma 10.1. By construction, 
every normal subgroup of G which contains S' also contains ((.$)). Thus, ((S)) 
is the smallest normal subgroup of G which contains S. 


Remark 10.4. There is also a “bottom-up” construction of ((S)); see Exercise 
10.1. 


Before we put these notions to work, we establish a useful multiplicative 
formula for the index of a “nested” subgroup: that is, a subgroup of a subgroup. 


Lemma 10.5. Suppose that H, < Hp < G and that |G: H,] < oo. Then we 
have 


[G : Ay] = [G : eby| 3 [H> : Hy]. 


Proof. We first note that in case G is finite, the result follows immediately 
from Lagrange’s Theorem; for in that case, we have [G : Hy] = |G| /|Hi|, 
[G : Hy] = |G| /|H2|, and [Hz : Hy] = |H2| / ||. However, this argument is 
of no use in the general case. Instead, we apply the result we used to prove 
Lagrange’s Theorem in the first place. 

So suppose that [G: Hi] < oo. Since Hy < Hz, Theorem 8.7 lets us write 


Hy = \J ah, (10.1) 
a€Ry, 


where R,; C Ho is a complete set of left coset representatives of H, in H2. Let 
us also choose a complete set of left coset representatives R2 of Hz in G, so 
that 

G= |.) bmp. (10.2) 


bER2 


By definition of the index of a subgroup, we have [Hj : Hy] = |Ri| and 
[G_ : Hp] = |Ro|. Note that we do not officially know yet whether these 


indexes are finite. 
J mtn | 


[Idea: G = |:) ba = [J »{U oa =U ach 


bE Re bER2 a€R, be Re 
Let € be the set of all left cosets of H, in G. Define a function f 


R, x Rp > € by the formula (a,b) + baH;. We claim that f is bijective. For 
surjectivity, let C € €. By definition of left coset, we can write C' = xH, for 
some x € G. By Equation 10.2, we can write x € bH2 for some b € Rp. So 
x = by for some y € Ho. By Equation 10.1, we can write y € aH, for some 
a € R,. So x = by € baH,. Now the cosets cH, and baH, have the element 


A Quotient Engine 93 


x in common, so the two cosets must be equal: C = «H, = baH, = f(a,b). 
This establishes that f is surjective. For injectivity, we refer to Exercise 10.3. 

It follows that the number of left cosets of H, in G is [G : Hi] = |¢€| = 
|Ry x Re| = |Ri|-|Re| = [G: He] - [He : Hi]. In particular, all these indexes 
are finite. 


Now we are ready to unleash the power of normal subgroups. We have 
seen in Section 7.7.2 that the kernel of a group homomorphism measures the 
amount of new relations we add in passing from the domain to the image of 
the homomorphism. In Section 7.7.4 (see especially Theorem 7.30) and later 
in Section 9.9.1 (especially Theorem 9.7), we saw that the image of a group 
homomorphism looks like the quotient of the domain by the kernel, and that 
any normal subgroup can serve as a kernel. Combining these ideas, we realize 
that when we take the quotient of a group G by a normal subgroup N 4G, 
we are merely adding new relations to G by forcing every element of N to be 
the identity: that is, the quotient group G/N is what we get by starting with 
the group G and adding all relations of the form n = e for n € N. 

The usefulness of the construction ((S)) is that it allows us to use any 
subset S of G as a set of relations to be forced upon G. Now of course S$ 
itself need not be a normal subgroup, or even a subgroup, of G; but ((S)) will 
be, and we think of the quotient group G/((S)) as the group G modified by 
adding exactly those relations forced on us by requiring s = e for every s € S. 


Example 10.6. We return to the dihedral group D2, in this example. We saw 
in Section 5.5.2 that Do, is generated by two elements F' and R satisfying the 
relations F? = e, R" =e, and FR = R“'F (as well as infinitely many other 
relations). Let w : Fr({x,y}) ~ Do, be the group homomorphism which 
maps xz to F and y to R, and let K = ker(w). As we have seen, relations 
in Den correspond to elements of K. We have x? € K and y” € K. To 
get an element of K from the third relation, we put it in standard form, 
namely F-!'RFR = e. This leads to the realization that x~!yry € K. Set 
F = Fr({z,y}), S = {x7,y", x yxy} C F, and N = ((S)). Then we have 
SC K and K < F,so N < K, as N is the smallest normal subgroup of F 
which contains S. 

Let Q = F/N, and let ¢ = «N, y = yN. Then Z,y € Q, and we have 
z=2N-aN =27N = N (since x? € N) = eg. Similarly, we have y” = eg 
and zy = y ‘2. These relations in Q illustrate what we have managed to do: 
by forming a normal subgroup N of F containing the three dihedral relations 
listed above, we forced corresponding relations to hold true in the resulting 
quotient group Q. We also note that Q = (z,y); for example, the element 
(cyy)N of Q can be written as cN-yN-yN = zyy € (Z,y). So Proposition 
5.10 applies, with z and y playing the roles of F and R, and we get |Q| < 2n. 

Since w is surjective, then D2, ~ #/K by Theorem 9.7. In particular, we 
have 2n = |Don| = |F/K| = [F : K]. Now we also have [F : N] = |F/N| = 
|Q| < 2n. Since N < K < F and [F : N] < oo, we can use Lemma 10.5 
to conclude that [F : N] = [F : K]-[k& : N]. So we have 2n > [F : N] = 


94 Making Groups 


[F : K]-[k& : N] = 2n-[K : N], which forces [K : N] < 1. Since [K : N] 
is a positive integer, we must have [kK : N] = 1, and from Exercise 8.1, we 
conclude that K = N. 

The upshot of all this is that Fr({z, y}) /(({x?,y",a~‘yry})) = Don. In 
other words, the three relations on which we have been focusing are enough 
to completely determine the structure of the dihedral group D2,,. When we 
consider the enormous complexity of the set of relations corresponding to Kk, 
this is an amazing result. 

Since every group is the quotient of a free group—as we see by putting 
together Example 7.12 with Theorem 9.7—this example can (in principle) be 
generalized: given a group G, we can try to find a small set of generators 
for G, together with a small set of relations involving these generators, such 
that G is isomorphic to the corresponding quotient of the free group on these 
generators. 


Example 10.7. Let G be a group. Let’s force G to be abelian. More precisely, 
let us add just enough relations to G so that the resulting quotient group is 
abelian. For each pair of elements a,b € G, we need to include the relation 
ab = ba. Putting this relation into standard form, we get aba~'b~' = e. So let 


C= (({aba'b-1 : a,b € G})). 


Let G@> = G/C. Then G®? is the largest abelian quotient of G. More precisely, 
we claim that G?? is an abelian group, and that if o : G— H isa surjective 
group homomorphism where H is abelian, then we must have C C ker(a); 
this last assertion follows from Exercise 7.12. In fact, o factors through G?? 
(see Exercise 9.15): we have 0 = Gov, where v is the natural map, and ¢ is 
the map on G?> induced by o. That is, the following diagram commutes (see 
Definition 7.5): 
G ne EE Gab Goss H 
Pe ee 
a 


Thus, H must be isomorphic to a quotient of G??. 


Definition 10.8. An clement of the form aba~'b7! is called a commutator, 
and the subgroup C' of Example 10.7 is called the commutator subgroup of G. 
The group G?@ is known as the abelianization of G. 


Room for Everyone Inside 95 


10.3. Room for Everyone Inside 


To end this chapter, let us return to the subject of group tables. Consider 
once again the group table for the group G = Zs: 


We will wring another insight from this little table. Notice that each row of 
the group table seems to contain every element of our group exactly once, in 
some scrambled order. A scrambling of the elements of G is, more formally, 
what we called a permutation of G. Thus, it seems that every row of the group 
table for G gives us a permutation of G. 

But each row of the group table is labeled (on the left) by an element of 
G. Thus, we can associate to every element of G a permutation of the set of 
elements of G. In other words, we have a function 


a : G+ Sym(G) 


for which o(g) is the permutation corresponding to the row labeled by g. In 
the present case, for example, o(1) is the permutation mz such that 77(0) = 
14+0=1, (1) =141=2, and 77{(2) =1+2=0. 

Since o is a very natural-seeming function between two groups, then, ac- 
cording to our principles, 0 ought to be a group homomorphism. This claim is 
justified in the following result, which applies to arbitrary groups. The reader 
is invited to check that the function 7, defined below is the general version 
of the function 77. The function 7, is the “left multiplication by x” function, 
which produces the row of the group table indexed by the element z. 


Theorem 10.9 (Cayley’s Theorem). Let G be a group. Define a function 
a: G+ Sym(G) 


by the formula o(a) = 1, where 1z(g) = x-g for each g in G. Then o is an 
embedding of G into Sym(G). 


The proof amounts to several formal verifications; no big ideas are needed, 
but rather we merely unravel definitions. 


Proof. Let G be a group. We first show that o really does map G to Sym(G). 
[Show Va € G, 7, € Sym(G); i-e., that 7, is a bijection from G to G] 
Let a € G. Suppose x,y € G and m,(2) = ma(y). Then a- 2 =a-y (by 
definition of 7,). So x = y by Lemma 3.46. Thus 7, is injective. 


Let z € G. [Solve the equation m,(x) = 2 for v:a-x =2z,r%=a71 


z] 


96 Making Groups 


Let « = a~'z. Then x € G, and we have m4(%) = a-2 =a-(a~!z) =z. So 
Tq 1s surjective. 

Therefore, 74 is bijective, so 7, € Sym(G). 

[Show that o is an embedding: an injective group homomorphism] 

Let a,b € G. [Show o(a- b) = o(a) 0 a(b)] 

Then o(a-b) = ma.b, o(b) = 7m, and o(a) = Ta. 

[Show 7a.b = Ta © T| 

[Show Va € G, ta.0(x) = (Ta 0 Mp) (X)] 

Let x € G. Then 7.5(x) = (a: b)a and (1 0 Mp) (2) = Ta(to(2)) = Ta (bx) 
= a(br). So 14.5(4) = (Hq 0 T)(x) by Axiom G1, associativity in G. Thus, 7 
is a group homomorphism. 

[Show a is injective] 

Let a,b € G, and suppose that o(a) = o(b). [Show a = }] 

Then 7 = 7. So Va € G,aa(x) = 7(a). Since e € G, we have ma(e) = 
mp(e). But this means that a-e = b-e, and so a = b. Thus, @ is injective. This 
completes the proof. 


Corollary 10.10. Every finite group is isomorphic to a subgroup of Sy, for 
some n. 


Proof. Let G be a finite group, and set n = |G|. By Cayley’s Theorem, we 
know that G — Sym(G); from Exercise 9.6, we have Sym(G) & S,,. Compos- 
ing these two maps, we get an embedding G@ Si. 


Note that Sym(G) does not “know” anything about the group operation 
of G, only about the underlying set of this group; yet Cayley’s Theorem tells 
us that G embeds in Sym(G) as a group! Cayley’s Theorem tells us that 
the family of symmetric groups is “universal” in a certain sense: every group 
can be found inside of one of the symmetric groups. Thus, by constructing 
subgroups of symmetric groups, we can make any group. 


ee 
10.4 Exercises 


Exercise 10.1. Let G be a group and let S C G. Set T= {ayx7! : reEG,ye 
S}. Prove that (T) = ((S)). 

Exercise 10.2. Let G be a group, and let S = {aba~'b-! : a,b € G} be the 
set of all commutators in G. Prove that (S) = ((S)). 


Exercise 10.3. Let f be the function that was defined in the proof of Lemma 
10.5. Prove that f is injective. 


Exercise 10.4 (Cayley’s Theorem is sharp for primes). Let p be a prime integer, 
and suppose that G is a group of order p. By Corollary 10.10, we know that 
G — Sp. Prove that if G— S,, for some integer n, then we must have n > p. 
Hint: the order of S, is n! =1-2-3+ +--+ +n. 


Exercises 97 


Exercise 10.5. Let S = {x1,...,%n} be a set of order n < oo, and let G = 
Fr(.$)?. (Note: the reader may use any non-empty set S, not just a finite set, 
with no real increase in difficulty.) We call G the free abelian group on S. 

(a) Let H be an abelian group, and let hy,...,hn € H. Let y; be the image 
of x; under the natural map from Fr(S) to G. Prove that there is a unique 
group homomorphism o : G — H such that for all i we have o(y;) = hi. 

(b) Suppose that H is an abelian group which can be generated by a set 
of n elements. Prove that H is isomorphic to a quotient of G. 

(c) Let A = Z” = {(ki,...,kn) : ki € Z}, with binary operation + given 
by (k1,.--, kn) + (41,---,€n) = (Ai + &1,.--, kn + £n). Prove that G & A. (For 
more details about the close relationship between Z and finitely generated 
abelian groups, see Chapter 18.) 


Exercise 10.6. Let n be an integer with n > 3. Let G be a group of order 2n 
generated by two elements F’ and G which satisfy the three dihedral relations 
F? =e, R” =e, and FR= R™'F. Prove that G = Dap. 


Exercise 10.7. Let n be an integer with n > 3. Prove that we have 


Fr({x,y}) /(({2?,y", (vy)?})) = Dan. 


(Compare to Example 10.6.) 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


11 


Rings 


11.1 A New Type of Structure 


A group is a set with a single binary operation satisfying certain axioms which 
we abstracted from the properties of ordinary number systems. Motivated 
by our desire to gain further perspective on ordinary number systems, and 
generalize them, we next introduce a new kind of algebraic object called a ring. 
Rings are richer than groups, and even more like ordinary number systems, 
because they have not one but two binary operations. 


Definition 11.1. A ring isa triple (R,+,-), satisfying the following 3 axioms: 
R1: (R,+) is an abelian group with identity element 0; 
R2: - is an associative binary operation on R; 
R32: - distributes over +: that is, for all x,y,z € R, we have (a+ y)- z= 
e-zty-zandz-(@+y)=z2-u+2-y. 


Remark 11.2. The operation - may not be commutative; if it is, we say that 
R is a commutative ring. Furthermore, - may not have an identity element; 
if it does, then we denote this element by 1g or simply 1, and say that “R 
has 1.” We often speak of the operation - as “multiplication,” and call lp a 
multiplicative identity element. Note that by Lemma 3.33, a multiplicative 
identity element must be unique if it exists. 


An element of R may or may not have an inverse under multiplication; we 
single out those elements which do: 


Definition 11.3. Let R be a ring with 1. Then a unit of R is an element of 
R which has an inverse under multiplication. 


Notation 11.4. The set of all units of R is denoted R* (pronounced “R cross”). 
That is, 


RX ={f@ER: YeRstin-y=y r=}. 
Theorem 11.5. [f R is a ring with 1, then (R*,-) is a group. 


Proof. Since (R, -) satisfies the group axioms G1 and G2, Exercise 4.8 applies 
to give the desired result. 


DOT: 10.1201/9781003252139-11 99 


100 Rings 


Example 11.6. Each of the following familiar number systems is a ring under 
ordinary addition and multiplication: Z, R, Q. In fact, these are commuta- 
tive rings with 1. The group of units of Z is the set Z* of all integers which 
have multiplicative inverses (reciprocals) which are also integers. Thus, we 
have Z* = {1,—1}. By contrast, almost every element of R has a multiplica- 
tive inverse in R: we have R* = R — {0}. Similarly, Q* = Q — {0}. 


Example 11.7. The set of all 2 x 2 (two-by-two) matrices with real entries is 


M2(R) ave 4] ahode Rh. 


We claim that M2(R) is a ring under the operations of “matrix addition” and 
“matrix multiplication” given by 


BN gall ie ae 8 i aceabetie Bee 
c ad w «xi |etw dt+z 


a b uov|_ | aut+tbw av+ bs 
ik ee el eit 

R1: It is straightforward to verify that Mj(R) is an abelian group under 
matrix addition. 

R2: Matrix multiplication is associative by Exercise 3.4. 

R3: The distributive property of matrix multiplication over matrix addi- 
tion is assigned to the reader as Exercise 11.1. 

Exercise 4.10 shows that the group of units of M2(R) is 


and 


M2(R)* = {A € M2(R) : |A| 4 0}, 


where |A| is defined as in that exercise. |A| is called the determinant of A. The 
group M2(R)”* is called the general linear group (of two-by-two real matrices), 
and is denoted GL2(R). 


Notation 11.8. Let R be a ring and let x,y € R. As is customary in an 
abelian group, we denote the inverse of x under addition by —2. Also, we 
write y — x as a shorthand for y + (—). We use exponential notation for 
repeated multiplication: x” denotes the product of n factors of x, if n is a 
positive integer. 

Warning 11.9. We do not attempt to define x” for a general ring element x 
when n is negative or zero. The trouble is that in a ring, an element « may not 
have an inverse under multiplication, and in fact the ring may not even have 
1. The reader is invited to verify, however, that the usual law of exponents 
gtr — ¢™. 7” is true when m and n are positive integers. Furthermore, if x 
is a unit, then we can raise x to any integer power, for in this case, x belongs 
to the group of units, with the operation of ring multiplication. 


Ring Fundamentals 101 


11.2. Ring Fundamentals 


The reader may wonder, in case R has 1, whether we can also write (—1)-a as 
—a. Fortunately, these two expressions are always equal. The following lemma 
establishes this identity as well as several others which are familiar from the 
algebra of number systems. 


Lemma 11.10 (Basic Properties of Rings). Let R be a ring. Let x,y € R. 
Then: 

(i)0-%=0=2-0 

(ii) (-2) -y =—(e-y) =a: (-y) 

(iii) (-2) -(-y) =2-y 

Further, if R has 1, then: 

(iv) (-1)- 4% =-2 

(v) (-1)? =1 


Proof. (i) We have 0+0 = 0 by Exercise 3.3. So we can say 0-4 = (0+0)-2 (by 
substitution) = 0-2+0-- (by the distributive property). By adding —(0- x) 
to both sides, we find that 0 = 0- a. The identity z-0 = 0 can be proved 
similarly. 

(ii) The trick here is to write (x + (—a))-y=a-y+(—a) -y, and realize 
that the left-hand side is also equal to 0- y, which is just 0 by part (i). Thus 
x-y+(—a)-y = 0, and so (—x)-y = —(x- y). The other identity may be 
proved similarly by considering x - (y + (—y)). 

(iii) We have (=z) - (—y) = —(«- (-y)) = -(-(x- y)), applying part (i) 
twice. By the Laws of Exponents, we have —(—(x-y) -y. (Specifically, we 
can apply Theorem 3.41 part (ii), with a = x-y and m = n = —1, interpreting 
this result in additive notation.) 

(iv) Supposing R has 1, then (—1)- a = —(1- 2) (by part (ii)) = —2. 

(v) We have (—1)? = (—1) - (—1) (by definition of exponential notation) 
= 1-1 (by part (iii)) = 1. 


Example 11.11 (Calculations in a ring). Let R be a ring, and let a,b,c € R. 
We can take advantage of the commutativity of R under + to simplify the 
expression 

a+b-—a-c 


down to 
b—-—e. 


The reader should try to perform this simplification one step at a time, using 
only one definition or result in each step. However, the familiar expression 


(a+b) -(a—6) 


102 Rings 


does not equal a? — b? in general. Attempting to expand this last expression, 
we find 


(a+ b)-(a—6) =a-(a—b)+b- (a—b) (by Axiom R3) 
=a-a+a-(—b)+b-a+b- (—D) (by R3 again) 
=a?—a:-b+b-a-0. (using Lemma 11.10) 


But because - need not be commutative, this expression cannot be further 
simplified. 

We next identify some especially nice types of ring, by imposing additional 
requirements above and beyond the ring axioms. In an arbitrary ring, there 
is nothing to prevent the product of two non-zero elements from being zero; 
the following definition ensures that this cannot happen. 

The condition “1 4 0” may seem odd in the definitions below. But in the 
trivial ring (or zero ring) R = {0} whose only element is 0, it is formally true 
that 0 is a multiplicative identity as well as an additive identity, so in this ring 
at least, we do have 1 = 0; see Exercise 11.2. 


Definition 11.12. A ring R is called an integral domain, or simply a domain, 
if R is a commutative ring with 1, we have 1 4 0, and 


Vae,yE R, x-y=0 x=Oory=0. 


Another way in which a ring can be well-behaved is that most of its ele- 
ments can have inverses under multiplication; however, we might be surprised 
if the element 0 were to have a multiplicative inverse; see Exercise 11.3. The 
following definition serves up as nice a scenario as we can hope for in this 
regard: 

Definition 11.13. A field is a commutative ring R with 1 4 0 such that 
R* = R-— {0}. 

Perhaps surprisingly, there is a relationship between the two types of ring 

just defined: 


Proposition 11.14. Every field is a domain. 


Proof. Let F be a field. Then F’ is a commutative ring with 1, and 1 ¥ 0. 
[Show that Vz,y € F,x-y=0 x =O0or y=0] Let x,y € F,, and suppose 
that «-y =0. [Show « = 0 or y =O] [This is equivalent tox 40 => y=0 
Suppose that « 4 0. [Show y = 0] Then x € F™ (by definition of field). So we 
can write 2~+-(2-y) =a~1-0; so (x@~!-x)-y =0 (by Axiom R2 and Lemma 
11.10); 1- y =0; and thus y = 0. 


In Chapter 4, we found a natural way to express the relationship between 
Z and R when considered as groups under addition, by introducing the notion 
of a subgroup. Now that we recognize the extra structure on these sets which 
makes them into rings, we would like to have a way to express this new 
relationship. The reader may wish at this point to review the discussion and 
definition of subgroup in Chapter 4 before reading further. 


Ring Homomorphisms, Ideals, and Quotient Rings 103 


Definition 11.15. Let (R,+,-) bearing. A subring of R is a ring (S,+5,-s) 
such that S C R, +5 = +|sxg, and -s =-|sxg. 


Thus, a subring is a subset of a ring which is a ring in its own right, under 
the “same” operations as its parent ring (restricted, of course, to this subset). 
In particular, just as with groups and subgroups, there is only one way in 
which a subset S of a ring R can be a subring of R: the binary operations 
of addition and multiplication on S are inherited from those on R. As with 
groups, we seldom use the clumsy notation of restriction of functions when we 
describe the ring operations of a subring; instead, we use the same symbols + 
and - for the subring as we do for the parent ring. 

Notation 11.16. We write S < R to mean S is a subring of R. We write S < R 
to mean that S is a proper subring of R; that is, S< Rand S4 R. 


Proposition 11.17 (Subring Test). Let S C R, where R is a ring. Then 
S<R iff both 

(SRT1): (S,+) < (R,+) 

and 

(SRT2): S is closed under multiplication: i.e., Vx,yE S,x-yeS. 


Proof. Exercise 11.10. 


11.3. Ring Homomorphisms, Ideals, and Quotient Rings 


We would like to define a notion of “homomorphism” that applies to rings. 
Taking our direction from our work with groups, it seems reasonable to define 
a ring homomorphism to be a function between two rings which commutes 
with both ring operations. This is what we shall do. 


Definition 11.18. Let R and S be rings. A ring homomorphism from R to 
Sis a function 0 : R—S such that for all x,y € R, we have 

(RH1): o(e + y) = o(x) + o(y) 

and 

(RH2): o(a-y) = o(x) -o(y). 


We also define the same special types of homomorphism for rings that we 
defined for groups: 


Definition 11.19. (i) An isomorphism of rings, or ring isomorphism, is a 
bijective ring homomorphism. 

(ii) An embedding is an injective homomorphism. 

(iii) An automorphism of a ring is a ring isomorphism from a ring to itself. 
Notation 11.20. We use the notation Aut(R) to denote the set of all automor- 


phisms of the ring R. As in the case of groups, the set Aut(R) forms a group 
under composition of functions (Exercise 11.13). 


104 Rings 


Remark 11.21. We are now finally in a position to give a (partial) answer, in 
the language of abstract algebra, to a general question left over from Chapter 
2: for which number systems is the number game winnable? First we need to 
say what we mean by “number system.” Let us say for now that a number 
system means a subring R of C with 1 € R. Then the number game on 
R is winnable only if Aut(R) = {id}, the trivial group. As with groups, 
ring automorphisms represent symmetries; and a symmetry of R represents 
a way to re-arrange the elements of R so that a player cannot tell that a 
change has been made! Therefore, if there is more than one element in Aut(R), 
then we cannot possibly tell the correct labeling of the numbers. We note, 
however, that even if Aut(R) is trivial, there is still a question of finding 
a finite procedure for identifying numbers, which is not possible when R is 
uncountable. We are by no means done with ring automorphisms, however; 
we have merely given them a name. Ring automorphisms play a central role in 
Galois Theory (Chapter 16), for example, which will allow us to give a more 
complete answer to the number game question (see Exercise 16.26), among 
many other applications. 


Remark 11.22. Every ring homomorphism 0 : R—- S is also a group ho- 
momorphism on the additive groups of the rings, because of (RH1). Thus, a 
natural point of interest is the kernel of o viewed in this light, namely, the 
set of all elements of R which map to 0g. We shall define the kernel of a ring 
homomorphism to be exactly this set. 


Definition 11.23. The kernel of a ring homomorphism 0 : R— S is the 
set 

ker(o) = {x ER : a(x) = 0s}. 
That is, the kernel of o is its kernel as a group homomorphism from (R, +) 
to (S,+). 


As in the case of group homomorphisms, the kernel of a ring homomor- 
phism measures how much information is lost in passing from the domain to 
the image. The case when the kernel is {0}, which is as small as it is possible 
to be, corresponds to the situation of an embedding, where we lose no infor- 
mation, and a faithful copy of the domain is transported into the codomain 
(see Exercise 11.23). In this case, we have R = o(R). 

When we asked which subsets of a group could be the kernel of a group 
homomorphism, we arrived at the notion of a normal subgroup. Let us now 
ask, Which subsets of a ring can be the kernel of a ring homomorphism? 

Suppose that 0 : R— Sis aring homomorphism, and let K = ker(c). To 
start with, since K is the kernel of the group homomorphism o0 : (R,+) > 
(S,+), we must have K<(R, +), by Theorem 7.19. Now since (R, +) is abelian, 
every subgroup of (R,+) is normal (by Exercise 11.4), so we may capture this 
condition by writing simply Kk < (R,+). 

So far, we have only related the fact that K is a kernel to the additive 
structure of the ring R. There is more to see when we consider the mul- 
tiplicative structure. For suppose that « € K and a € R. Then we have 


Ring Homomorphisms, Ideals, and Quotient Rings 105 


a(a- x) = o(a)- o(x) (by (RH2)) = o(a) - 0 (since x € ker(c)) = 0 (by 
Lemma 11.10). But this tells us that a- is also in the kernel of 0, namely, 
K. Likewise, we can see that x-ae€ K. 

It turns out that these properties of a kernel of a ring homomorphism are 
enough to characterize a subset of a ring as a kernel, and so we record them 
in a definition: 


Definition 11.24. Let R be aring. An ideal of R is a subset I of R such that 
(Id1): (J,+) < (R,+) 
and 
(Id2): VaeITVae R,a-x«elandz-ael. 


Remark 11.25. We may say that an ideal is an additive subgroup of a ring 
(Id1) which “absorbs” arbitrary ring elements under multiplication (Id2). 


Remark 11.26. What we have just defined is sometimes referred to as a two- 
sided ideal, since it must absorb elements of R from both sides, the right and 
the left. We shall not consider one-sided ideals in this text. 


Remark 11.27. The name ideal is rooted in the history of algebraic num- 
ber theory. In brief, nineteenth-century mathematicians struggling to prove 
Fermat’s Last Theorem, the statement 


Ve,y,2,n€ Zn >2 => a 4+y" FZ 2, 


encountered difficulties because the familiar uniqueness of factorization of in- 
tegers into primes fails in the case of more general number systems. The con- 
cept of an ideal was invented to salvage unique factorization; while numbers 
do not always factor uniquely, ideals (“ideal numbers”) do, in the appropriate 
context. 

Example 11.28 (Ideals of Z). If I is an ideal of Z, then (1,+) < (Z,+), so I 
must by cyclic, by Exercise 5.6. So the only candidates for ideals of Z are the 
sets (n) for n € Z. Recall from Example 4.18 that (n) is simply the set nZ of 
all integer multiples of n. But this set has the absorption property (Id2): a 
multiple of an element of nZ is another element of nZ, since a multiple of a 
multiple of n is another multiple of n. Therefore, every such set is an ideal of 
Z. To summarize: the ideals of Z are the sets nZ for each n in Z. 


Example 11.29 (Ideals of R). Suppose that J is an ideal of R and that a € 
I—{0}. Then a € R*, so a! € R, and we have a~!-a € I, by (Id2). Thus 
1 € J. Using (Id2) again, we find that any real number times 1 must be in J, 
and therefore, all of R is in J; so J = R. We conclude that the only possible 
ideals of R are {0} and R; it is easy to check that both of these sets are in fact 
ideals of R. The same argument generalizes to any field F' (Exercise 11.7). 


Roughly speaking, ideals of a ring are like normal subgroups of a group: 
they are the type of object which can be the kernel of a homomorphism, as we 
shall see. But before we reach this result, we come to another parallel between 
normal subgroups and ideals: just as we can form the quotient of a group by 
a normal subgroup, we can form the quotient of a ring by an ideal: 


106 Rings 


Theorem 11.30 (Existence of a Quotient Ring). Let R be a ring, and let I 
be an ideal of R. Then the set Q of all cosets of I in R under addition, 


Q={a+I: a€ R}, 
forms a ring under the operations 
(a+ 14+ (b+1)=(a+b)+I 


and 
(a+JI)-(b+ I) =(a-b)+TI 


We call Q the quotient ring of R by I. 


Proof. Let I be an ideal of R, and set Q= {a+J : a€ R}. Then (/,+) < 
(R,+) by (Id1), so Q forms a group under addition, by Theorem 7.26. Note 
that (Q,+) is just the quotient group R/I where R is considered as a group 
under addition. Now, (R/I,+) is an abelian group, by Exercise 11.5. (The 
identity element of (R/I,+) is Or/r =OR+I =I.) 

The main point of the proof is to show that multiplication is well-defined 
on R/T; this means that the product of two cosets should not depend on the 
particular coset representatives we choose. So let C,,C2z € R/I, and suppose 
that we can write C; =a, +/J=6,4+J and Cp = ag+J=b24+/ with ay, aa, 
b,, bg in R. Then we have d, := a, — 6; € I and dp := az — be € I, by Exercise 
8.2. Now we have 


a1 - a2 = (b; + dy) - (b2 + do) 
= by + by + by + dg + dy - bg + dy - do. 


Since d,d2 € I, we have }, - dg, d; - ba, di - dg € I, by (Id2). Therefore, 
ay: a2 —b,-b2 € T, and so a, -ag4+] = b,-b2+T, by Exercise 8.2. This shows 
that indeed our definition of C, - C2 does not depend on the choice of coset 
representatives. 

Ring axioms R2 and R3 are now straightforward to verify in R/I. For 
R2, we have 


((a+1)-(b+1))- (c+ D= 


Fal) (bse) Ps) 


( 
=f 
=(a- 0. a +I (by Axiom R2 for R) 
=(a 
=(a+I)- (+2): (c+). 


Axiom R3 is left to the reader as Exercise 11.9. 


Notation 11.31. We write R/I for the quotient ring of R by I. We sometimes 
read this expression as “R modulo I” or “R mod I.” 


Ring Homomorphisms, Ideals, and Quotient Rings 107 


Remark 11.32. Suppose that R is a ring and J is an ideal of R; then I is 
a subgroup of the abelian group (R,+), so we can form the quotient group 
R/I or the quotient ring, also denoted R/I. However, no confusion is likely 
to arise, because these two quotient objects are identical as sets, and also as 
additive groups; the only difference is that the quotient ring has the extra 
operation of multiplication. 


Example 11.33 (Quotients of Z). From Example 11.28, we know that for each 
nin Z, the set nZ is an ideal of Z. Therefore, we can form the quotient ring 
Z/nZ. Since this quotient ring is the same as the quotient group Z/nZ with 
the extra operation of multiplication, then in particular, we have |Z/nZ| =n 
when n > 1 (see Example 7.29). Again, the elements of the quotient ring 
Z/nZ are the cosets of nZ in Z, which are just the congruence classes modulo 
n, namely a+ nZ for a € Z. 

These quotients of Z provide our first examples of rings which are not 
domains. For example, in Z/6Z, we have (2 + 6Z)- (3 +6Z) = 6+6Z = 
0+ 6Z = 0z/6z, yet neither 2+ 6Z nor 3 + 6Z is 0 in Z/6Z. 

For which positive integers n is Z/nZ a domain? 

To start with, we must have n > 2, since ifn = 1, then we are dealing with 
the trivial or “zero ring” Z/Z with only one element, namely 0z/z = 0+Z = Z 
(see Exercise 11.6). In this trivial ring, we have 1 = 0, so it is not a domain. 

So assume n > 2. The zero element of Z/nZ is nZ. So the condition we 
need is 


(a+nZ)-(b+nZ) =nZ = at+nZ=nZ or b4+nZ=nZ. 


Simplifying on the left and using the criterion of Exercise 8.2 for when two 
cosets are equal, we need 


a-benZ = a€nZor be nZ. (11.1) 
We can rewrite this implication in the language of divisibility, as 
n|(a-b) => niaorn| ob. (11.2) 


But this is just the property that n is prime. So we have found: If n is a 
positive integer, then Z/nZ is a domain iff n is prime. 

More generally, we can ask, For which ideals I of Z is Z/I a domain? Since 
every ideal of Z is of the form nZ for some integer n, and since nZ = (—n)Z 
for any n, the only ideal we haven’t yet checked is the zero ideal 0Z = {0}. 
We have Z/{0} = Z by Exercise 11.6, and Z is certainly a domain. 


The rings Z/nZ are more or less ubiquitous in mathematics because of 
their usefulness. Indeed, beginning students of mathematics often study these 
rings before encountering the general concept of a ring itself. The “ring of 
integers modulo n,” as we sometimes refer to Z/nZ, is often represented using 
the numbers 0, 1, ...,2 —1 to stand for the congruence classes 0 + nZ, 1+ 
nZ, ,...(~—1)+nZ. Then the operations of addition and multiplication 


108 Rings 


on Z/nZ are defined to be simply “addition and multiplication modulo n”: 
after performing the ordinary addition or multiplication, we reduce the result 
modulo n to get it into the appropriate range from 0 to n — 1. The reader 
should undertake to verify that this description of Z/nZ really does give the 
right ring. 

If n, a, and b are integers such that a | 6 and b| n, then n/b is a quotient 
of n/a. Conversely, the quotients of n/a are all of the form n/c where a | c. 
Remarkably, the same phenomenon occurs in general for quotient rings, if we 
replace n by aring R, replace a and b by ideals J and J, and replace divisibility 
by containment. That is the content of the following result. 


Lemma 11.34. Let R be a ring, and let I be an ideal of R. Then there is a 
bijection 


w : {J : J ts an ideal of R/I} 3 {J : J is an ideal of R s.t. J DI} 


given by 
w3)={aeER: ate FH. 


The proof is requested in Exercise 11.21. Also see Exercise 7.17 for the 
analogous result for groups. 

The significance of Lemma 11.34 is that quotient rings are simpler than 
their parent rings: the ideals of R/I come from the ideals of R which contain 
I. Thus, in passing from R to R/T, we “lose” all ideals which do not contain 
I, but keep the rest—modulo I. 

In case J is an ideal of R such that I C J, we indicated above that the ring 
R/J should be a quotient of R/I, corresponding to the situation with ordinary 
integers. What is actually true is that R/J & (R/I)/3, where J = w'(J). If 
we use the suggestive notation J = J/I, then we can rewrite this assertion in 
the following form, which suggests cancellation of fractions: 


Lemma 11.35. Let R be a ring, and let I and J be ideals of R such that 
IC J. Then 
(R/I)/(J/1) = R/J, 


where by J/I we mean 
J/T=w'(J)={a+IeR/I : a€ J} 


and w is the function defined in Lemma 11.84. (See Project 23.14 for a justi- 
fication of this notation in the proper context, where J/I will be given its own 
structure as a type of algebraic object called a module.) 


Proof. Exercise 11.22. 


We will now fulfill our mission to complete the analogy between ideals of 
a ring and normal subgroups of a group, by proving the analog for rings of a 
combination of Theorems 7.19, 7.30, and 9.7. 


Ring Homomorphisms, Ideals, and Quotient Rings 109 


Theorem 11.36 (Fundamental Theorem of Ring Homomorphisms). (i) Let 
R be a ring, and let I be an ideal of R. Then the natural map 


ao: ROR/I 


given by 
o(r)=r+I 


is a surjective ring homomorphism, and ker(o) = I. 
(i) Ifr : RS is any ring homomorphism, then T(R) < S, ker(7) is 
an ideal of R, and we have T(R) = R/ker(r). 


Proof. (i) Let R be a ring, and let I be an ideal of R. Defineo : R- R/I 
by o(r) = r+ for r € R. Note first that (J,+) < (R,+), by (Id1). So by 
Theorem 7.30, 0 is a surjective group homomorphism from (R,+) to (R/I, +) 
with ker(o) = I. Since the kernel of o as a ring homomorphism is the same 
as its kernel as an additive group homomorphism, it only remains to show 
that o respects multiplication. So let x,y € R. Then o(x-y) = (a-y) +I 
=(x+1)-(y+J) =o(2) - o(y), as required. 

(ii) Let 7 : R— S be a ring homomorphism. Then 7 is also a group 
homomorphism, 7 : (R,+) > ($,+). By Theorem 7.13, we have (r(R),+) < 
(S, +). Further, 7(R) is closed under multiplication, since if a,b € r(R), then 
we can write a = T(a),b = 7() for some a, 3 € R, and then a-b = r(a)-7(8) = 
T(a- G) € T(R). Therefore, we have 7(R) < S by the Subring Test. 

Let K = ker(r). By Theorem 7.19, we have (K,+) < (R,+), which estab- 
lishes property (Id1) for kK. Let r € R and x € K. Then we have 


T(r-xz) = T(r)-r(x) (by (RH2) ) 
= T(r)-0g (since « € K) 
= Os (by Lemma 11.10). 


Thus, r-a € K. Similarly, we can see that «-r € K. This establishes (Id2), 
so K is an ideal of R. 

Now by Theorem 9.7, the function f : R/K — 7(R) given by the formula 
f(r + K) = T(r) is a well-defined group isomorphism. Since K is an ideal 
of R, we can regard R/K as a ring, whose elements and addition operation 
are the same as those of the group (R/K,+). To check whether f respects 
multiplication, let a, 8 € R/K. Then we can writea=r+K and8=s+K 
for some r,s € R. We have f(a: 8) = f((r+ K)-(s+ K)) = f((r-s)+ K) 
= 1(r-s) =7(r)-7(s) (since 7 is a ring homomorphism) = f(r+K)- f(s + K) 
= f(a)- f(8). Now we know that f is a ring homomorphism. Since f is an 
isomorphism of additive groups, then f is a bijective function. This tells us 
that f is also a ring isomorphism. 


110 Rings 


11.4 Exercises 


Exercise 11.1. Prove that matrix multiplication distributes over matrix addi- 
tion in M2(R): that is, verify Axiom R3 for M2(R). 
Exercise 11.2. (a) Let R = {a} be a set of size one. Show that R forms a ring 
with 1 = 0. (There is only one way to form a binary operation on R!) 

(b) Suppose that R is a ring with 1 = 0. Prove that R has only one element, 
so that R = {0}, the zero ring. 
Exercise 11.3. Prove that if R is a ring with 1, and 0 € R*, then R = {0}. 
Moral: you can divide by 0, but only if 0 = 1. 


Exercise 11.4. Prove that if G is an abelian group and H < G, then H dG. 


Ezercise 11.5. Prove that if G is an abelian group and H JG, then G/H is 
also abelian. 


Exercise 11.6. Let R be a ring. 
(a) Prove that the set {0} and the set R are ideals of R. 
(b) Prove that R/{0} = R. 
(c) Prove that R/R = {0} (the zero ring). 


Exercise 11.7. Prove that if F is a field, then the only ideals of F are {0} and 
F, 
Exercise 11.8. Prove that if R is a ring and J, J are ideals of R, then IN J is 
also an ideal of R. 
Exercise 11.9. Complete the proof of Theorem 11.30 by verifying Axiom R3 
in R/I. 
Exercise 11.10. Prove the Subring Test, Proposition 11.17. 
Exercise 11.11. Let R be a domain, and let S < R. 
(a) Prove that if lp € S, then S is also a domain. 
(b) Prove that if S' is a domain, then 1g = 1p (and in particular, 1p € S). 
(c) Find an example of a commutative ring T with 1 which has a subring 
V such that V is a domain, but ly 4 1r. Hint: look in the ring of 2 x 2 
matrices over R. 
Exercise 11.12. Prove the analog of Lemma 9.5 with groups and group iso- 
morphisms replaced by rings and ring isomorphisms. 
Exercise 11.13. Prove that if R is a ring, then Aut(R) is a group under com- 
position of functions. (Exercise 11.12 may help.) 
Exercise 11.14. Let R= {a+ bV/2 : a,b € Z}. Prove that R is a subring of 
R, the ring of real numbers. (Compare Exercise 2.2.) 
Exercise 11.15. Let R be a domain, and suppose that Z < R. Let o be an 


automorphism of R. 
(a) Why must o(0) = 0? 


Exercises 111 


(b) Prove that o(1) = 1. 

(c) Prove by induction that o(n) = n for every n € Z such that n > 1. 

(d) Prove that we also have o(n) = n for all n € Z such that n < 0. Thus, 
o maps every integer to itself. 

(e) Describe the group Aut(Z) explicitly. If you have read Chapter 2, then 
explain how your answer relates to the number game on Z. 


Exercise 11.16. Let the ring R be as in Exercise 11.14. Suppose that o is an 
automorphism of R. 

(a) Prove that o(n) =n for every n € Z. (The results of the previous two 
exercises may be used.) 

(b) Prove that o(/2) € {/2, -V2}. 

(c) Use the previous parts of this exercise to prove that there are at most 
two different automorphisms of R, and give explicit formulas for them. 

(d) Check that each of your two candidate automorphisms from part (c) 
is actually an automorphism of R. If you have read Chapter 2, then explain 
how your result relates to the number game on R. 


Exercise 11.17. This exercise continues the notation of Exercise 11.16. Write 
Aut(R) = {e,o}, where e is the identity map on R, and a sends V2 to —V2. 

(a) Explain why, for any a € R, we have a € Z iff o(a) =a. 

(b) Prove that for all a € R, we have a-o(a) € Z. Try to accomplish this 
by using part (a), without writing a explicitly in terms of V2. 

(c) Prove that RX NZ C Z™. 

(d) Prove that RX ={a+b/2 : a,b € Zand a? — 2b? € {1, —-1}}. 

(e) Find some units of R. Can you find all of them? 
Exercise 11.18. For a ring R, define the center of R to be the set C(R) : 
{ae R : Vbe R,ab = ba}. Prove that we have C(R) < R. 


Ezercise 11.19. Find an example of aring R with a subset S' such that (S,+) < 
(R,+) but S' is not an ideal of R. 


Exercise 11.20. (a) Write out addition and multiplication tables for the rings 
Z/nZ for each integer n between 1 and 6, inclusive. 

(b) Which of the rings in part (a) are domains? Which are fields? 

(c) Find all of the ideals of each ring in part (a). 


Exercise 11.21. Prove Lemma 11.34. 
Exercise 11.22. Prove Lemma 11.35. 


Exercise 11.23. Prove that a ring homomorphism is injective if and only if its 
kernel is {0}. Hint: Lemma 9.18 may be useful. 


Exercise 11.24. This exercise translates Exercise 9.15 to rings. Suppose that 
o0 : R-Sisaring homomorphism with kernel J, and that J is an ideal of 
R with J C I. Prove that o factors through R/J, by showing that the map 
a : R/J > S given by the formula a+ J + o(a) (for a € R) is a well-defined 
ring homomorphism. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


12 


Results on Commutative Rings 


12.1 Introduction 


In this chapter, we only consider rings which are commutative. Commutative 
rings include all fields and domains, as well as many other, not-so-nice rings 
which have “zerodivisors”: non-zero elements a and 6 for which a- b= 0. 

Our major theme in this chapter will be to relate the properties of quotient 
rings to the properties of the ideals we “divided” by. In particular, we will 
discover for which ideals I the quotient ring R/I is a domain, and for which 
ideals the quotient is a field. We will also discover how to produce ideals more 
or less at will (Section 12.3). 


12.2 Primes and Domains 


Revisiting Example 11.33, we notice that Z/nZ is a domain iff n is a prime 
element of Z (at least, for n > 2). This suggests that there is a relationship 
between primality (“primeness”) and quotient rings being domains. To explore 
this relationship in general, we first need to come up with a suitable defini- 
tion of “prime” in more generality, which will apply to a wide class of rings 
instead of just to the ring of integers Z. We will actually define two notions 
of primeness: one for a ring element, and another for an ideal. 

The discussion in Example 11.33 above, and, in particular, Equation 11.1, 
motivates the following general definition for when an ideal is to be called 
“prime.” 


Definition 12.1 (Prime Ideal). Let R be a commutative ring with 1, and let 
I be an ideal of R. Then I is called prime if I 4 R (J is “proper”), and 


Va,be R,a- bel a€lorbel. 
Notice that in Example 11.33, we characterized prime integers as those 


integers greater than one which, if they divide a product of two factors, must 
divide one of the factors. Guided by this idea, we will now define the notion 


DOI: 10.1201/9781003252139-12 113 


114 Results on Commutative Rings 
of prime element in an arbitrary domain. First, we extend the definition of 
divisibility to arbitrary commutative rings with 1. 


Definition 12.2. Let R be a commutative ring with 1, and let a,b € R. To 
say b | a (read “b divides a”) means Jc € Rs.t.a=b-c. 


Warning 12.3. When we write b | a, it is important to know which ring R the 
factors of a are allowed to come from. Notice that no description of R is built 
in to the notation for divisibility. Thus, for example, while we have 2¥ 3 in 
Z, yet it is true that 2|3 in R. 


Definition 12.4 (Prime Element). Let R be a commutative ring with 1, and 
let a € R— {0}. Then ais prime in R if a g R* and 


Vb,c€ R,a| (b-c) = a|bora|e. 


Remark 12.5. In our general definition of a prime element, we could not cap- 
ture the condition that a prime should be greater than 1, because there is 
no notion of ordering in an arbitrary ring. As a result, there are some inte- 
gers which are prime according to Definition 12.4 but are not prime according 
to the usual definition of prime for integers; see Exercise 12.3. This is not a 
bad thing: our general definition of “prime” is actually more natural, even for 
ordinary integers. 

Although our general definition of a prime element does what we want, 
the reader may be more familiar with the definition of a prime number as a 
number which has “no” factors: more precisely, no positive factors except for 
one and itself. 

If we try to define primality of an integer in terms of factorization, without 
referring to any notion of ordering (e.g. positive or negative), then both the 
“trivial” factors of 1 and —1 should not count as factors, since any integer n can 
be factored using these numbers. For example, we can write n = (—n)-(—1) = 
nel. 

The astute reader will recognize 1 and —1 as exactly the units of Z. In 
general, any ring element may be “factored” using units. This observation gives 
rise to the following definition, which describes those elements of a domain 
which have no non-trivial factors, and so cannot be “reduced” into a product 
of simpler terms. Informally, the definition simply says that a unit does not 
count as a factor. 


Definition 12.6. Let R be a domain, and let a € R—{0}. Then ais irreducible 
in Rifa¢ R* and 


Yb,c€ Raa=b-c = bER* orcE R*. 


Remark 12.7. Although the notions of prime and irreducible are equivalent 
for ordinary positive integers, they are not equivalent for a general domain 
R. However, it is true in any domain that every prime element is irreducible 
(Exercise 12.11). It is in a sense the discrepancy between these two notions 
which measures the extent to which unique factorization fails in R. 


The Ideal Generated by a Set 115 


Next we come to the first result which relates a nice property of a quotient 
ring to a property of the corresponding ideal. 


Theorem 12.8. Let R be a commutative ring with 1, and let I be an ideal of 
R. Then R/I is a domain iff I is prime. 


Proof. (=): Suppose that R/J is a domain. [Show I is prime] [Show I is proper 
and Va,be Roa: bel = > ac€lorbel 

By definition of domain, we have 0g/7 #4 1Rr/1. Now 0rsp =1 =0+J and 
lpr =1+T, so we can conclude from Exercise 8.2 that 1-0 ¢ I,ie.,1¢ J. 
In particular, J 4 R, which means I is proper. 

Let a,b € R, and suppose that a-b € J. [Idea: we must somehow work 
with properties of R/I to conclude that a € I or b € I. How can we get into 
R/I given a and b? Answer: form the cosets a+ J and 6+ I.] 

Then a+J,b+J € R/I, and we have (a:b) +I =0+T/ (since a-be I). 
We can rewrite this as (a+ I) - (b+) = Opy;. Since R/T is a domain, this 
forces a+ I = Opry or 6+ I = Op/z, which in turn means a € J or b € J, as 
desired. 

(<): Suppose that J is prime. [Show R/J is a domain] [Show Orr 4 1 rr 
and Vz,y € R/I,x-y Orr wo Orr or y = Ort] 

First, assume for a contradiction that 0p/; = 1,R/;. Writing this equation 
using coset notation, it says 0+ J = 1+. By Exercise 8.2, we have 1 € J. 
Since 1 € R*, we conclude from Exercise 12.6 that J = R, so I is not proper. 
This contradicts the definition of prime ideal. 

Let 2, y € R/I, and suppose that x-y = Op/;. Then « = a+J andy =b+I 
for some a,b € R (by definition of R/I). So we have (a+ J) -(b+I) =0r/1 = 
0+J/. Thus (a-b) + J = 04+, and soa-b € I, by Exercise 8.2. Since I is prime, 
we know that a € J or b € J. This in turn gives a+ J =0+J or b+I1=047, 
which means x = 0/7 or y = Op/z, as desired. 


12.3. The Ideal Generated by a Set 


Let R be a commutative ring with 1, and let S C R be a non-empty subset of 
R. What is the smallest ideal I of R which contains S? 

First of all, if s ¢ S and r € R, then we must have rs € I, by (Id2). More 
generally, if $1, 52, ...,8n € S and 7m, ra, ...,T € R, then (Id2) requires 
that we have 1181, 7252, TnSn € I, and then (Id1) forces I to be closed under 
addition, so we must also have 715, + rosg +--+: +7nS, € I. It turns out that 
we can stop here: the following result says that the set of all elements of this 
form is already an ideal! Because of its importance, this kind of set deserves 
its own notation: 


116 Results on Commutative Rings 


Notation 12.9. Let R be a commutative ring with 1. Let S C R. Then 


k 
(S):= {ons : cennennesh, (12.1) 


i=l 


We refer to (S$) as the ideal generated by S; this terminology is justified by 
Theorem 12.12 below. If S = {s1,82,...,8n} is a finite set, then we also 
write ($1, $2,...,5n) for ({81, 52,...,5n}), omitting the set braces inside of 
the parentheses. 


Definition 12.10. The expression yey 7,8; is called an R-linear combination 
of elements of S. Thus the set (5) is the set of all R-linear combinations of 
elements of S. 


Remark 12.11. We allow 0 as a natural number, and so we may get a sum 
with k = 0 terms in the expression above. We interpret a sum with no terms 
(or “empty sum”) to be equal to 0. Conveniently, this takes care of the case 
when S = 0; for the smallest ideal containing the empty set as a subset is 
simply the smallest ideal of all, which is just the ideal {0}. In practice, we 
usually write (0) for this zero ideal, in preference to {0} or ({0}) or even (Q). 


Theorem 12.12. The set (S) given by Equation 12.1 is an ideal of R. It is 
the smallest ideal of R which contains S. 


Proof. Suppose that R is a commutative ring with 1 and S C R. Set I = (S). 
The main point of the proof is to show that sums and negations of R-linear 
combinations of elements of S are again R-linear combinations of elements of 
S, as is the product of an R-linear combination with an element of R. 

[Show (Id1): (J,+) < (R,+)] [Use the Subgroup Test] 

[STO] By Remark 12.11, we know that 0 € J, so I is non-empty. 

[ST1] Let a € I. Then we can write a = um a;z; for some k € N, 
a; € R, and x; € S, by definition of (S'). Therefore, we have 


$e (>: on) 


(—aizi) 


I 
M- 


-. 
Il 
ua 


(—a;)x;. (by Lemma 11.10) 


I 
M- 


ey 
il 
ua 


Since —a; € R for each 7, the element —a is again an R-linear combination of 
elements of S, and so —a e€ I. 

[ST2] Let a, 8 € I, and write a = 0*_, aya;, 8 = Y7h_, diy; with k, 0 EN, 
aj,b; € R,and x;,y; € S. Then a+8 = ayai+---+agxg+biyit: + -+beye € (S), 
since it is an R-linear combination of elements of S. 


The Ideal Generated by a Set 117 


Therefore, we have (J,+) < (R,+). 

[Show (Id2): I absorbs elements of R under multiplication from either 
side] 

Let a= ae a,x; € I, and let r € R. Then we have 


r(a;2;) (by distributing repeatedly using Axiom R3) 


Ta 


I 


© 
Il 
un 


I 
M- 


(raj) x4 (by Axiom R2, associativity of multiplication) 
1 


©. 
Il 


which is again an R-linear combination of elements of S. Since R is assumed 
to be a commutative ring, we have ar = ra € I as well. 

We have proved that (S$) is an ideal of R. That (5') is the smallest ideal 
of R containing S is justified by the discussion immediately preceding this 
theorem, together with Exercise 12.4. 


Remark 12.13. Only in the final step of the proof above did we use the fact 
that R was commutative. However, if R is not assumed to be commutative, 
then our construction of (S) yields only the smallest left ideal of R containing 
S. The reader is invited to find a simple expression for the smallest two-sided 
ideal containing S in a general (non-commutative) ring. 


When we studied the subgroup generated by a set, the simplest case, that of 
a subgroup generated by a single element, received special attention: we called 
such groups cyclic. The corresponding notion here is the ideal generated by a 
single element of a ring: 


Definition 12.14. An ideal which is generated by a single element is called 
principal. That is, if R is a commutative ring with 1, then an ideal J of R is 
called principal if J = (a) for some a € R. 


A principal ideal (a) in a commutative ring R (with 1) is the set of all 
R-linear combinations of elements of the singleton set {a}. Technically, in our 
definition of the ideal (S), we did not require the elements s; in the sum on 


118 Results on Commutative Rings 


the right-hand side of Equation 12.1 to be distinct. Thus, we have 


k 
(a) = {Soe > REN, € Ris; € co} 


i=l 
k 
= {done : kennel 
i=l 
k 
-{(s) : kennel} 
i=l 
={r-a:reER}, 


where the last equality holds because a finite sum of elements of R is a single 
element of R and vice versa. Thus, the principal ideal generated by a is the set 
of all “multiples” of a by elements of R. We may therefore use coset notation 
for this set, using the multiplication operation of the ring R, and write 


(a) = Ra = aR. 


More generally, there is no reason to include repeated elements of S among 
the s;’s when we build the ideal (.S), since we could factor out the common s; 
terms. Thus, we have the following simplified description of (.S') in case S' is a 
finite set: 


Lemma 12.15. Let S = {s1,...,8,} be a finite subset of a commutative ring 
R with 1. Then we have 


(S) = {Sons : ner} = Rs,+---+ Rsy. 


i=l 


While units get in the way of factoring elements of a ring, they quietly 
go away when we introduce them into ideals, as the following result shows. 
This is part of what makes the theory of ideals the “right” place to talk about 
factorization. 


Lemma 12.16. Let R be a domain, and let a,b € R. Then (a) = (0) iff 
b=u-a for some u € R*. Moreover, if a #0 and (a) = (ca) for some ce R, 
then c € R*. 


Proof. Exercise 12.5. 


Example 12.17. We saw in Example 11.28 that in the ring Z, every ideal has 
the form nZ for some n in Z. We can recognize now that such ideals are 
principal: we have nZ = (n), using the notation introduced in this section. 
Thus, every ideal of Z is principal. 


Fields and Maximal Ideals 119 


Rings with this property are important enough to get their own designa- 
tion, although the term we use to describe them is not very imaginative: 


Definition 12.18. A domain in which every ideal is principal is called a 
principal ideal domain, or PID. 


We now have quite an impressive hierarchy of types of ring. By Exercise 
12.7, every field is a PID, so we can say 


field = PID => domain = commutative. 


To conclude this section, we will ask another question related to building 
ideals in a commutative ring with 1: What is the smallest ideal which contains 
both a given ideal J and a given element x? The reader should attempt to 
prove the following result (Exercise 12.8): 


Lemma 12.19. Let R be a commutative ring with 1, let I be an ideal of R, 
and let x € R. Then the set 


J=14+Rr={ct+ty-x: ce€landye R} 


is an ideal of R, and J is the smallest ideal of R which contains both I and x. 


12.4 Fields and Maximal Ideals 


A field, a commutative ring with 1 4 0 in which every non-zero element has a 
multiplicative inverse (Definition 11.13), is in a sense the nicest type of ring. 
Let us now explore the question, For which ideals I is R/T a field? 

Suppose that R is a commutative ring with 1, J is an ideal of R, and R/I 
is a field. 

Then we have (R/I)* = R/I — {Op;r}. Recall that 0g/7 = 1 =0+ I. So 
for a typical element «+I of R/I, to say « + I # Ops; means exactly that 
cr€llie,xcER—T. 

Now suppose that « € R—J. Then «+ TJ € (R/I)*, so there exists 
y € R such that (y+J)-(a@+J) = 1pz/; = 1+. This equation says that 
(y- 2) +2=14+4 7, which in turn tells us that 1—y-a € I. 

Set c=1—y-a. Then c € J, and we have y-x +c = 1. This statement 
tells us that any ideal J of R which contains both J and x must also contain 
1: for if [TC J and x € J, then we have y- x € J by (Id2), so y-a+ceé J by 
(Id1). 

Another way to make this argument is that since c € I and y-z € Ra, 
then we have 1 € J+ Ra. But I+ Rz is the smallest ideal of R containing 
both I and x, so J must contain 1. 


120 Results on Commutative Rings 


An ideal of R which contains 1 must contain (1) = R-1 = R, and therefore 
must be equal to R itself. Therefore, our argument shows that for any element 
x outside of I, there is no proper ideal of R which contains both J and x. More 
informally speaking, we can’t make I any bigger as an ideal without capturing 
the entire ring R; there is no ideal of R bigger than J except for FR itself. This 
tells us that I is maximal in the following sense: 


Definition 12.20. Let R be a commutative ring with 1, and let J be an ideal 
of R. Then I is called maximal if I is proper, and for any ideal J of R, 


ITcCJCR = JER. 
This discussion motivates the following beautiful result: 


Theorem 12.21. Let R be a commutative ring with 1, and let I be an ideal 
of R. Then R/I is a field iff I is maximal. 


Proof. (=): Suppose that R/T is a field. [Show I is maximal: use the definition 
of maximal] 

[Show I is proper] 

Assume for a contradiction that J = R. Then 1 € J,so0+/7=147, 
which says that 0p/; = 1g /7, contradicting the definition of field. Therefore, 
I must be proper. 

[Show for all ideals J, TC JOR = J=R) 

Suppose that J is an ideal of Rand IC JC R. [Show J = R| 

Then dz € J—I (since I Cc J). Now since x ¢ I, we have +I #0+T, so 
x+I¢R/I—{0p;;}. Therefore, x +I € (R/I)* (by definition of field). So 
there is some element y+ J of R/I (with y € R) such that (y+J)-(a@+J) = 
lpsp = 14+ I. Thus, (y-x)+2=1+4+T1,and1—y-2€ I. Set c=1-—y-a2. Then 
cé€IcCJ, and we have y-x+c=1. Now cé€ J, and also y- x € J (because 
x € Jandy € R), and so y-x+c € J, since J is a group under addition 
(Id1). Therefore, 1 € J, and so J D (1) = R. Since we also have J C R, we 
conclude that J = R, as desired. 

(<): Suppose that I is maximal. [Show R/T is a field] 

[Show R/I is a commutative ring with 1p/; 4 OR/7] 

First, R/I is a commutative ring, by Exercise 12.1. Furthermore, we have 
lpr # Ops because I is proper: if1+7=0+4 J, then 1 € I, so J = R, and 
I would not be proper. 

[Show (R/1)* = R/T — {On/1}] 

Let a € R/I—{0r/7}. Then a = «+T/ for some « € R—I. Let J=I+Rz. 
Then J is an ideal of R containing both J and x, so we have JC J C R. By 
definition of maximal, we must have J = R. In particular, since 1 € R, we 
have 1 € J, so we can write 1 =c+y-z2z for some c€ I and y € R. Therefore, 
1—y-a« € I, and so we have y-x+J=1+4T/ in R/I. We can rewrite this last 
equation as (y+ I) - («+ JI) = 1/1, which shows that «+I € (R/I)*. 

We have shown that R/I— {0p /r} C (R/I)*. Now we certainly can’t have 
Orr € (R/I)*, because then, by Exercise 11.3, we would have R/I = {Or/r}, 
whereas we know that R/J has at least two distinct elements, since we already 


Fields and Maximal Ideals 121 


proved that 1p/; # 0p/;. Thus, we must have (R/I)* = R/I — {Op;r}. We 
conclude that R/T is a field. 


As a corollary to Theorem 12.21, we get a relationship between the notions 
of prime and maximal for ideals in a commutative ring with 1. 


Corollary 12.22. In a commutative ring with 1, every maximal ideal is 
prime. 


Proof. Let R be a commutative ring with 1, and let M be a maximal ideal of 
R. Then R/M is a field (by Theorem 12.21). Therefore, R/M is a domain (by 
Proposition 11.14). Now by Theorem 12.8, we conclude that M is prime. 


Remark 12.23. The relationship between the notions of prime and maximal 
may not be obvious from the original definitions of these two terms (the reader 
is invited to scrutinize both definitions). Now, however, we can say that if an 
ideal is not prime, then it is not maximal. So it is tempting to ask whether 
there is a simple way to enlarge a non-prime ideal to a bigger, but still proper, 
ideal. This is accomplished in Exercise 12.9. 


Example 12.24. What are the maximal ideals of Z? By Corollary 12.22, the 
only candidates for maximal ideals are among the prime ideals of Z. We found 
in Example 11.33 that the only prime ideals of Z are (0) and the ideals (p), 
where p is a positive prime number. 

Recall that an ideal is maximal if it is proper and it is not contained in 
any other proper ideal. When does one ideal of Z contain another? Let us take 
two ideals nZ and mZ. If nZ C mZ, then we must have in particular that 
n € mZ, and so n must be an integer multiple of m; in other words, we must 
have m | n. Conversely, if m | n, then every multiple of n is also a multiple of 
m, so we have nZ C mZ. This argument shows that nZ C mZ iff m | n: “To 
divide is to contain.” See Exercise 12.10 for a general version of this principle. 

Now, no positive prime integer is a multiple of another, so there are no 
proper containment relationships among prime ideals of Z of the form (p) 
for p > 0. Therefore, the only proper containment relationships among prime 
ideals of Z must involve the zero ideal, (0). And in fact, we have 


(0) Cc (py) 
for every prime p € Z. We conclude that the maximal ideals of Z are precisely 
the ideals of the form (p), where p is a positive prime. 


As a consequence of Example 12.24 and Theorem 12.21, we are entitled to 
state the following corollary: 


Corollary 12.25. Z/pZ is a field whenever p is a prime element of Z. 


Notation 12.26. When p is a prime element of Z, then the field Z/pZ is 
sometimes written GF(p), where “GF” stands for Galois field. Evariste Galois 
(pronounced “gal WAH”, rhymes with Frangois and Ah!) was a brilliant math- 
ematician who is responsible for many of the crucial insights in the origins of 
abstract algebra. 


122 Results on Commutative Rings 


Remark 12.27. We saw in Example 12.24 that the only proper containment 
relationships among prime ideals of Z are of the form 


(0) c (p) 


where p is an ordinary (positive) prime of Z. In general, in a commutative 
ring R with 1, a collection of prime ideals Py, Pi, ..., Py, such that 


PpcRhc-::-CcCrF 


is called a chain of prime ideals of length k. The dimension of R (also called 
the Krull dimension) is defined to be 


dim(R) = the maximum length of any chain of prime ideals of R. 


Thus, we have dim(Z) = 1. The reader may wonder why we start numbering at 
zero instead of one here: the answer is that dimension in this sense is intimately 
related to “dimension” in the classic sense of geometry, and starting at zero 
is the correct choice to make the two notions of dimension match. 


12.5 Exercises 


Exercise 12.1. Prove that if R is a commutative ring and J is an ideal of R, 
then R/T is also commutative. 


Exercise 12.2. Let R be a commutative ring with 1. Prove that R is a domain 
iff (0) is a prime ideal of R. 
Exercise 12.3. Let R be a commutative ring with 1. 

(a) Prove that if a is a prime element of R, and u is a unit of R, then au 
is also a prime element of R. 

(b) Prove that if R is a domain, a is an irreducible element of R, and wu is 
a unit of R, then au is also an irreducible element of R. 

(c) Find all of the prime elements of Z (using the general definition of 
“prime,” Definition 12.4). Part (a) may help! 


Exercise 12.4. Prove that if R is a commutative ring with 1 and S C R, then 
SC (S). 

Exercise 12.5. Prove Lemma 12.16. 

Exercise 12.6. Let R be a commutative ring with 1, and let a € R. Prove that 
(a) = Riffae R*. 

Exercise 12.7. Prove that every field is a PID. (Exercise 11.7 will be useful 
here.) 


Exercises 123 


Exercise 12.8. Prove Lemma 12.19. Can you generalize this result to find the 
smallest ideal containing a given ideal I together with a finite collection of 
elements 1, ...,2n of R? 


Exercise 12.9. Let R be a commutative ring with 1, and let J be an ideal of 
R. Suppose that a, b € Randa-b € TI but a ¢g I and b ¢ I. Prove that 
I+Ra# Rand I+ RbF R. This shows “directly” (well, by contrapositive, 
but without using quotient rings) that a maximal ideal must be prime. 


Exercise 12.10 (To Divide Is To Contain). Let R be a commutative ring with 
1, and let a,b € R. Prove that (a) D (b) iffa| bin R. 


Exercise 12.11. Let R be a domain, and let a € R. Prove that if a is a prime 
element of R, then a is irreducible in R. 


Exercise 12.12. Let R be a commutative ring with 1, and let a € R— {0}. 
Prove that (a) is a prime ideal of R iff a is a prime element of R. 


Exercise 12.13. Let p be a positive prime integer, and let n be any integer. 

(a) Why must |(Z/pZ)*| =p — 1? 

(b) Prove that if p/ n, then n+ pZ € (Z/pZ)*. Conclude that n?~! = 1 
(mod p) if py n. 

(c) Prove that n? = n (mod p), whether or not p | n. This is Fermat’s 
Little Theorem. 


Exercise 12.14. Fermat’s Little Theorem (see Exercise 12.13) can often be 
used to quickly show that a given integer m is not prime, without the need 
to factor m. If we can factor m— 1, then a converse can also be approached. 
As an illustration in a very special case, let r be a positive integer, and let 
m=2"+1. 

(a) Prove that ifn™~! 4 1 (mod m) for some positive integer n < m, then 
m is not prime. 

(b) Show how we can compute n™~! modulo m in only r steps. 

(c) Prove that if n”—)/? = —1 (mod m) for some integer n, then m is 
prime. Hint: Show that if p is a prime which divides m, then for such an n, 
the order of n+ pZ in the group (Z/pZ)”* is m—1. 


Exercise 12.15. Let p be a positive prime of Z such that p = 3 (mod 4). Prove 
that (Z/pZ)* has no element of order 4. Conclude that the congruence 


n?=-1 (mod p) 


has no integer solution n. 


Exercise 12.16. Let n be a positive integer. Let R = Z/nZ, and let a € Z. 
Prove that if gcd(a,n) > 1 then a+nZ ¢ R*. 


Exercise 12.17. This exercise continues Exercise 12.16. Let U = {b + nZ 
b € Zand gced(b,n) = 1} C R. Let a € Z with gced(a,n) = 1, and set 
a=a+nZ eU. Define a function f : R- Rby f(8) =a8 for BER. 

(a) Prove that f is a group homomorphism from G to G, where G = (R, +). 


124 Results on Commutative Rings 


(b) Prove that f is injective. Since R is finite, conclude that f is bijective, 
hence f € Aut(G). 

(c) Deduce that a € R*. 

(d) Conclude that R* =U. 


Exercise 12.18. Use Exercise 9.16 together with Exercise 12.17 to prove that 
for any positive integer n, we have Aut(Z/nZ,+) = (Z/nZ)* via the map 
ar o(1+nZ). 


13 


Vector Spaces 


13.1 Introduction 


In this chapter, we introduce our third type of algebraic structure, the vector 
space. Some readers may have encountered vectors, or vector spaces, in previ- 
ous mathematics courses, or even in other areas of study, such as physics. To 
motivate our definition, we first give an example. 


Example 13.1 (Informal Physics Vectors). Let us consider the set of all “ar- 
rows” which can be drawn inside the ordinary Cartesian plane, R?. An arrow 
may have its “tail” at any point, and its “head” (where we draw the arrow- 
head, or tip) at any other point—or possibly at the same point as the tail. 
We consider two arrows to be the same if they point in the same direction 
and have the same length, regardless of where the arrows are placed within 
the plane; see Figure 13.1. 


y 


FIGURE 13.1: Vectors in R? as arrows 


We define a way to “add” two arrows v and w by drawing them so that 
the tail of w is located at the head of v, and take the sum to be the arrow 
whose head is located at w’s head and whose tail is located at v’s tail. 


DOI: 10.1201/9781003252139-13 125 


126 Vector Spaces 


We also define a way to “multiply” an arrow by a real number. To multiply 
the arrow v by the number c, we stretch out v by a factor of c, keeping its 
direction the same; only, if c < 0, we flip the direction by 180 degrees, and 
then stretch by a factor of |c]. 

This example follows the style which some scientists use to describe vectors; 
an arrow in this example is a vector, and the collection of all arrows will be a 
vector space. 


The example above used real numbers in an essential way, as was common 
in the classical treatment of vectors. It turns out, however, that the theory of 
vector spaces works just as well when we replace R by an arbitrary field. 


13.2. Abstract Vector Spaces 


To capture the conditions necessary for a general, or abstract, vector space, 
we need a general description of the addition and multiplication operations. 


Definition 13.2. Let (F,+ r,-r) be a field. A vector space over F (or F- 
vector space) is a triple (V,+v,-v), where (V, +r) is an abelian group and -y 
is a function 


yi: FxVoV 
satisfying the following axioms for all x,y € V and all a,b € F: 
(VS1) : (a-rb)-ya = 
( (a)): av(etvy) = 
(VS2(b)): (at+rb)-va = 
(VS3) Ip-y & = 


Associativity] 

Left Distributivity] 
Right Distributivity] 
Unitary Law] 


-y (b-y 2) 
vityayvy 
vitybys 


8e eg ea 


We usually are less formal, using the same symbol + for both addition oper- 
ations and using - for both multiplication operations, so the axioms become 


(VS1) : (a-b)-x% = a-(b-a) [Associativity] 
(VS2(a)): a-(a+y) = a-x+a-y {Left Distributivity] 
(VS2(b)): (a+b)-% = a-x+b-a2 [Right Distributivity] 
(VS3) : lp: =e [Unitary Law] 


Note that by (VS1), we can safely write a-b- x without parentheses. A vector 
is an element of V. We call F the field of scalars or base field of the vector 
space, and we refer to an element of F' as a scalar. The operation -y is called 
scalar multiplication. 


Remark 13.3. As with groups and rings, we often suppress the operations 
when we talk about vector spaces, and refer to V by itself as a vector space. 


Abstract Vector Spaces 127 


Example 13.4. The Cartesian plane R? is a vector space over R. via the op- 
erations 
(x,y) + (w, 2) = (e@+w,y + 2) 
and 
c: (x,y) = (cx, cy) 


for x,y,w,z,c € R. This vector space is equivalent to the one in Example 
13.1, if we identify an arrow whose tail is located at (a,b) and head at (c,d) 
with the vector (c — a,d — b), which gives the displacement from the tail to 
the head. 

In this example, we say that addition and scalar multiplication are per- 
formed componentwise, for evident reasons (look at the formulas above). This 
example generalizes to any Cartesian power of any field: if F' is a field and n 
is a positive integer, then the set F'” is a vector space under the operations of 
componentwise addition and scalar multiplication. 


Example 13.5. We claim that C is “naturally” a vector space over R. Why? 
Well, R and C are fields, and R < C. So C is an abelian group under addition, 
and there is a natural way to multiply a real number by a complex number 
to produce a complex number. We know that the associative and distributive 
laws are true in these familiar number systems, and that 1 is a multiplicative 
identity element. 

This example generalizes: if F' is a field and R is a ring with 1 such that 
F < R (as rings), then R is naturally a vector space over F’: we use the addition 
and multiplication operations which R possesses as a ring to make R into an 
F-vector space (see Exercise 13.1). In this situation, we can even multiply a 
vector by another vector, since the multiplication operation is defined for any 
two elements of R, and not just for an element of F' with an element of R. 
However, we shall not discuss the idea of “multiplying two vectors” in this 
chapter (such a notion is useful and important, but we shall not need it here; 
see e.g. Definition 23.62 for one such possibility). 


Lemma 13.6 (Basic Properties of Vector Spaces). Let V be a vector space 
over a field F. Then for alla € F and alla € V, we have: 

(1) Or-x=O0y 

(2) a: Ov = Ov 

(3) (-1r) «= 2 


Proof. (1): Let « € V. Then 0f-a = (Or+0r)-a = Or-4+0p-a. Since 0p-a € 
V, and V is an abelian group under addition, we can use the Cancellation Laws 
for groups to conclude that Oy = Or - z, as desired. 

(2): Let a € F. Then a-0y =a- (Ov +0y) =a-0y +a: Oy. Cancellation 
in (V,+) again lets us conclude that a- Oy = Oy. 

(3): Let « € V. We have Oy = Or - x (by (1) above) = (lp +(-l1pr))-2 
=1p-x+(-lp)-x =2x+4+(-1p)- 2. Adding —z to both sides completes the 
proof. 


128 Vector Spaces 


Now that we have some experience defining subgroups and subrings, it 
should not be difficult to define a “substructure” for a new kind of algebraic 
object: in this case, for a vector space. The idea again is that one vector 
space W naturally lives inside of another vector space V if W C V and W 
“inherits” its operations from V. Instead of calling W a sub-vectorspace, we 
follow common practice and call it a subspace. 


Definition 13.7. Let (V,+vy,-v) be a vector space over a field F’. A subspace 
of V is a vector space (W,+w,-w), also over F’, such that W C V, +yw = 
t+vlwxw, and -w =-vlrxw- 


Notation 13.8. We write (W,+w,-w) < (V,+v,-v) to indicate the subspace 
relationship. Just as with groups and subgroups, if we are given a vector space 
(V,+,-) over a field F’ together with a subset W C V, then there is at most 
one way to define operations on W to make W a subspace of V: namely, 
by restricting the operations of V. Accordingly, we write W < V to mean 
(W,+lwxw,'lexw) < (V,4+,-). We also write W < V to indicate that W is 
a proper subspace of V. 


As usual, we will use the same symbols (most often just + and -) for 
addition and scalar multiplication on both a vector space and a subspace. 
This will cause no confusion, since the operations only differ in their domains, 
not in the way they produce output. 

If V is a vector space over F’, then every subspace W of V is also a subgroup 
of V under addition, as we can see by looking at the definitions. The only extra 
condition we need for W to be a subspace of V is that W be closed under 
scalar multiplication. This is the content of the following result: 


Proposition 13.9 (Subspace Test). Let V be a vector space over a field F, 
and letW CV. ThenW < V iff 

[SST1]: W forms a subgroup of V under addition, and 

[SST2]: W is closed under scalar multiplication: that is, 


Va EW Vac Fi a: xe Ww. 


Proof. Left to the reader as Exercise 13.2. 


Let V be a vector space over a field F’, and let S C V. What is the smallest 
subspace W of V which contains S? 

Well, if s € S and a € F, then we must have a-s € W. If s1,...,8, € S 
and a,,...,a, € F, then we must have a;-s; € W for each 7; and since W 
must be closed under addition, this gives a, - $1 + a2-s2+---+axp- 5p EW. 


Notation 13.10. Let V be a vector space over a field F’, and let S C V. Then 
the set 

(S) := {a1s1 +--+ +agsp : KEN, a; € Fis; € SI 
is called the space spanned by S' (over F'). Here as elsewhere, a sum with zero 
terms is taken to be the additive identity element, in this case Oy. 


Bases: Generalized Coordinate Systems 129 


Lemma 13.11. Let V be a vector space over a field F, and let SCV. Then 
(S) is a subspace of V; tt is the smallest subspace of V which contains S. 


Proof. The proof is so similar to that of Theorem 12.12 that we leave it to 
the reader as Exercise 13.3. 


The reader may have expected to see (5) referred to as “the subspace 
generated by S,” but by tradition, we use the term spanned here instead. 
The element a,s; +--+ + azs, (with a; € F and s; € S) is referred to as 
an F-linear combination of s1,...,5,; sometimes we refer to the elements a; 
as coefficients. We may also describe (S') by saying that (S) is the set of all 
F-linear combinations of elements of S. 

Often, we take the opposite point of view to that in Lemma 13.11. Namely, 
we start with a vector space W, and ask for a set S with the property that 
(S) = W. From this point of view, we say that S spans W. 

There is an analogue for vector spaces of Lemma 12.15: 


Lemma 13.12. Let V be a vector space over a field F, and let S = 
{s1,..., Sn} be a finite subset of V. Then we have 


(S) = {Som : werh 


Proof. Exercise 13.4. 


Remark 13.13. As a special case of Lemma 13.12, we have (0) = {Oy}; com- 
pare Remark 12.11. 


13.3. Bases: Generalized Coordinate Systems 


Let us consider once more the vector space R? endowed with componentwise 
addition and scalar multiplication, as in Example 13.4. We are used to thinking 
of a point in R? in terms of its x and y coordinates; indeed, if P = (z,y) is 
any point in R?, then the Cartesian coordinates x and y specify P uniquely; 
this is part of what makes the Cartesian coordinate system in the plane so 
useful. 

Let us attempt to isolate the coordinates « and y as real numbers, as 
opposed to pieces inside of a vector, starting with the equation P = (z,y). 
We have P = (x,y) = (2,0) + (0, y) = x- (1,0) +y- (0,1). Every vector in R? 
can be written as an R-linear combination of the two vectors (1,0) and (0,1). 
Furthermore, the real numbers x and y in the expression for P are unique: 
we can say that every vector in R? can be written uniquely as an R-linear 
combination of (1,0) and (0,1). 


130 Vector Spaces 


The following definition captures this important property of Cartesian co- 
ordinates by simply replacing the special vectors (1,0) and (0,1) by arbitrary 
vectors, and replacing the vector space R? by an arbitrary vector space. 


Definition 13.14. Let V be a vector space over a field F’. A basis of V over 
F is a set B C V such that every vector in V can be written uniquely as an 
F-linear combination of elements of B. 


Remark 13.15. A basis can be thought of as a coordinate system for a vector 
space, with coordinates which are elements of the field F’. More specifically, if 
B is a basis of V over F, then every vector v € V has unique “coordinates” 
with respect to B: namely, the coefficients we need in order to write v as an 
F-linear combination of elements of B. 


Remark 13.16. The plural of basis is bases. 


Example 13.17. The set B := {(1,0), (0,1)} is a basis of the vector space R? 
under componentwise operations. Indeed, this was our motivating example for 
the definition of basis. The coordinates of a vector v € R? with respect to this 
basis are the usual Cartesian coordinates of v. 


Example 13.18. It is also instructive to try to find a basis of C over R. We 
claim that the set {1,7} is such a basis. Indeed, every complex number z has 
a unique representation as z = x«+iy=a2-1+y-i with z,y € R. This is 
exactly what it means to say that the set {1,1} is a basis of C over R! 


Remark 13.19. The bases we examined in the previous two examples are by 
no means the only bases for R? or C (over R), although they are perhaps the 
most natural. In fact, it turns out that any two points P and Q which do not 
lie on a straight line containing the origin will form a basis of R? over R. In 
a sense, “most” sets of 2 elements of R? form a basis of R? over R. Later, 
in Theorem 13.24, we shall prove that every basis of a given vector space has 
the same size; so the number 2 cannot be altered here! 


Example 13.20. Since every subspace is a vector space in its own right, we can 
look for a basis of a given subspace W of a vector space V over a field F’. The 
smallest subspace of any vector space V is the zero subspace, W = {Oy}. We 
don’t have many choices when it comes to finding a basis B for {Oy}, since 
we must have B C {Oy}. We may be tempted to try B = {Ov} = W, but this 
is no good: for the vector Oy has more than one representation as an F’-linear 
combination of Oy, for instance, Ov = 07 -Oy = 1p - Oy. In fact, the empty 
set @ is the unique basis of the zero subspace. (Compare Remark 13.13.) 


It is often helpful to split up the definition of basis into two parts: existence 
and uniqueness. Let us illustrate what we mean. 

Let V be a vector space over a field F’, and let B be a basis of V over F’. 

Then every element of V can be written as an F-linear combination of 
elements of B (temporarily ignoring the stronger condition of uniquely in the 
definition of basis). This is exactly what it means to say that B spans V; in 
our notation, that (B) = V. This is the “existence” part of the definition of 


Bases: Generalized Coordinate Systems 131 


basis: it says that, given any v € V, there exist basis elements b),...,b, and 
field elements a,,...,a% such that v = a, -b; +--- + az- dg. 
Next, the uniqueness part of the definition of basis says that, if we can 


write 
k k 
i=l i=l 


with v € V, coefficients a;,a; € F, and distinct basis vectors b; € B, then we 
must have a; = G; for all i from 1 to k. In other words, the coefficients of the 
basis elements in the expression for v as an F-linear combination of elements 
of B are uniquely determined by v. 

The very careful reader may at this point have something to complain 
about. Namely, in Equation 13.1, we implicitly assumed that our two ways of 
writing v both used the same set b),...,b, of basis elements. But what if we 
have 

C= a,b, + dab _ Gabe + 4363 


for some @1, @2,41,@2 € F and distinct b,,b2,b3 € B? Surely, the uniqueness 
part of the basis definition has something to say about this situation, too? 

Yes, it does. The simple insight here is that we can write both represen- 
tations of v in an expanded form, so that they both use all three of the basis 
vectors b;, bo, and bs: 


v= a,b + agbe + Op b3 = Op * by + a2b2 + a3b3. 


This is in the form of Equation 13.1, so we conclude that a, = Or, ag = Gg, 
and b3 = Op. In general, whenever we have two representations of a vector 
as a linear combination of basis elements, we can always use the same finite 
set of basis elements in both representations, by adding extra terms with zero 
coefficients, as needed. And of course, in case our basis is a finite set, we can 
always use all elements of the basis in representing any vector. 

It turns out that the question of uniqueness in the definition of a basis can 
be reduced to the question of whether the vector Oy has a unique represen- 
tation; the only idea involved in the proof is to move everything to the same 
side of the equation: 


Lemma 13.21. Let V be a vector space over a field F, and let BC V. Then 
B is a basis of V over F iff (B) = V and Oy has a unique representation as 
an F-linear combination of elements of B (namely, where all coefficients are 


Dp): 


Proof. (=): Suppose that B is a basis of V over F. Then every vector in V 
has a unique representation as an F-linear combination of elements of B. So 
B spans V, and in particular, the vector Oy € V has a unique representation 
as an F-linear combination of elements of B. 

(<): Suppose that B spans V and that Oy has a unique representation as 
an F-linear combination of elements of B. Let a € V. Since (B) = V, we can 


132 Vector Spaces 


write a = Sy a;-b; for some positive integer k and some coefficients a; € F’ 
and distinct basis vectors b; € B. Suppose that we also have a = San ai: b; 
with G@; € F and 6; € B. First, by adding terms with zero coefficients as 
needed, we may assume that @ = k and b; = b; for all i from 1 to k. Therefore, 
we have 


k k 
a= So a-b => 0a; b; 
i=1 t=1 
From this, we get 
k 
I=: 
But we can also write 
k k 
S-Op-b = oy =o, 
i=1 t=1 


Now the uniqueness of the representation of Oy as an F-linear combination 
of basis elements tells us that a; — 4; = Or for each 7. Therefore, a; = G; for 
each 7, as required. 


Next we isolate and give a name to the condition that the zero vector has 
a unique representation as an F-linear combination of elements of a set B. 


Definition 13.22. Let V be a vector space over a field F’. Let S C V. We 


say that S is linearly independent over F if whenever s1, ..., Sn are distinct 
elements of S and aj, ..., Gp, are arbitrary (not necessarily distinct) elements 
of F’, then 
n 
S438; =Oy => Vj,a; = Op. (13.2) 
j=l 


Otherwise, we say that S is linearly dependent over F. 


Using Definition 13.22, we can restate Lemma 13.21 as follows: 
Lemma 13.21’. Let V be a vector space over a field F, and let BC V. Then 
B is a basis of V over F iff B spans V over F and B is linearly independent 
over F, 

We have noted that every basis of a vector space V must span V. The 
next result says that a basis of V spans V “efficiently”: we cannot remove any 
elements of a basis without losing this spanning property. 


Theorem 13.23. A basis is a minimal spanning set, and conversely. That 
is: Let V be a vector space over a field F', and let S C V. Then S is a basis 
of V over F iff S spans V and no proper subset of S spans V. 


Proof. (=): Suppose that S is a basis of V over F. Then S spans V, by 
Lemma, 13.21. Assume for a contradiction that S is not a minimal spanning 


Bases: Generalized Coordinate Systems 133 


set for V. Then there is some vector s € S such that S — {s} also spans V. 
Let S = S — {s}. Now since s € V = (S), we can write 


S= 41-8, +--+ + aK: Sz (13.3) 
for some a; € F and 3; € S. More simply, we can also write 
s=lpr-s. (13.4) 


But Equations 13.3 and 13.4 are each representations of s as an F-linear 
combination of elements of S$ (since S C $). The coefficient of s in Equation 
13.4 is 1p, whereas in Equation 13.3 the coefficient of s is Or (since s cannot be 
among the §;’s). Since 1p 4 Or (by the definition of a field), the representation 
of sas an F-linear combination of elements of S is not unique. This contradicts 
the definition of a basis, and completes the proof of the forward implication. 

(<): Suppose that S is a minimal spanning set for V over F’. 

[Show: S is a basis of V over F' 

(Strategy: Use Lemma 13.21] 

(Show: (S') = V and Oy has a unique representation] 

Then in particular, S spans V. Suppose that Oy = ey ajs; wherek EN, 
a; € F,and s1,..., 8% are distinct elements of S. [Show that a; = 0p for all 4] 

Assume for a contradiction that a; 4 Or for some 7; without loss of gen- 
erality, say a, # Op. [Idea: Solve for 54] 

Since F is a field and a, € F'— {0}, we have a, € F'*, so we can multiply 
both sides by a,‘ and solve for s1: 


Ov = a181 + a282 +--+ asp 
a; -Oy = az! - (ays; + aes +--+ + apsp) 
Ov =s,+ (ay a2) Sgterrt (ajax) Sk 


81 = (—aj‘ag)s9 +--+ + (ay ax) 8x (13:5) 


(Idea: since we can write s; in terms of the other s;’s, we don’t really need 
s; in S] Let S = S — {s,}. Equation 13.5 shows that we have s; € (S). We 
certainly also have S$ C ($). Thus, $ U {s;} C (S). But this just says that 
S C (S). Now, ($) is a subspace of V which contains S, and ($) is the smallest 
subspace of V which contains S; therefore, ($) C (5). We know that S' spans 


V, so (S) = V, and V C (S). But (S) is also a subspace of V, which gives 
the containment (S) C V. Therefore, (S) = V. So S$ is a proper subset of S 
which spans V, contradicting the fact that S' is a minimal spanning set for V. 

This contradiction shows that a; = Or for all 7, and thus Oy has a unique 
representation as an F-linear combination of elements of S. By Lemma 13.21, 


S is a basis of V over F. 


Theorem 13.24 (Invariance of Basis Size). Suppose that V is a vector space 
over a field F with a finite basis B of size n. Then every basis of V over F' 
has size n. 


134 Vector Spaces 


Proof. Let V be a vector space over a field F’. Let B be a basis of V over F’. 
Suppose that B is finite, and let n = |B|. Suppose that C' is any basis of V 
over EF’. We proceed by induction on |B — C}. 

Base Case: |B — C| = 0. 

In this case, we have B C C. Now we cannot have B C C, because C 
is a minimal spanning set for V (by Theorem 13.23), and B also spans V. 
Therefore, B = C, so |C| = n, as required. 

Inductive Step: Let k = |B —C| > 1, and suppose that for any basis C of 
V over F with |B - C| < k, we have |C| =n. 

[Idea: Replace an element of C' — B in C by an element of B — C] 

We complete the proof using a series of claims, which we outline now: 

Claim 1: dr e C— B. 

Claim 2: dy € B—C such that y ¢ (C — {z}). 

Claim 3: The set T := (C — {x}) U {y} is a basis of V over F. 

Proof of Claim 1: If C C B, then, arguing as we did in the base case above, 
we would have C = B, contradicting |B — C| > 1. Therefore, dx € C — B. 

Proof of Claim 2: Let C’ = C' — {x}. Since C is a minimal spanning set 
for V, and C’ Cc C, we must have (C’) 4 V. But if B C (C”), then we would 
have V = (B) C (C’), since (B) is the smallest subspace of V containing B. 
Therefore, Jy € B — (C’). 

Now as y ¢ (C’), we have y ¢ C’ = C'— {a}. Since « ¢ B but y € B, we 
also have y # x. Therefore, y ¢ C. 

Proof of Claim 3: Set T = C’U{y}. Since y € V and C isa basis for V over 
F, we can write y = )°\", a;c; for unique coefficients a; € F, with distinct 
c; € C (for some m € N). We contend that x must be one of the c;’s with a 
non-zero coefficient: for otherwise, we would have y € (C’), contradicting the 
choice of y in the proof of Claim 2 above. Without loss of generality, x = cy, 
and a; 4 Or. So we have 


y= act >> aie. (13.6) 
i=2 
Since a; 4 Or, we can solve Equation 13.6 for x: 


g=a;ly+ S > (-az*a,)e. 
i=2 
Since we have c; € C’ for all i > 2, this shows that x € ({y} UC") = (T). 
Now we know that x € (T), and certainly C’ C (T), so C = CU {a} C (T). 
Therefore (C) C (T); and since C is a basis of V, this gives (T) = V. 
Consider a representation of Oy as an F-linear combination of elements of 
T: 


q 
Oy = wyt >> wie, (13.7) 
w=1 


Bases: Generalized Coordinate Systems 135 


with w,w; € F and distinct elements c, € C’. First, if w # Or, then we 
could solve Equation 13.7 for y in terms of the c;’s and find that y € (C’), 
contradicting the choice of y. Therefore, we have w = 0. Now, since C'is a basis 
for V over F and C’ C C, the only way to write Oy as an F-linear combination 
of elements of C’ is to make all the coefficients equal to 0. Therefore, we have 
w; = Or for all 7. 

We have shown that T spans V and that Oy has a unique representation 
as an F-linear combination of elements of T. By Lemma 13.21, T is a basis of 
V over F. 

Now we complete the proof of Theorem 13.24. Note that |B—T| = 
|B —C|-—1, since we formed T out of C by replacing an element of C — B 
with an element of B — C. Specifically, we have B — T = (B— C) — {y}. 
By inductive hypothesis, we conclude that |T'| = |B| = n. In particular, T is 
finite, and we have C' = (T — {y}) U {x}, so |C] = |T| = n, as required. 


Definition 13.25. Let V be a vector space over a field F’. The dimension of 
V over F is the size of any basis of V over F’. This size, if finite, is independent 
of the particular basis chosen, by Theorem 13.24. 


Notation 13.26. We write dimr(V) for the dimension of V over F’. 


Remark 13.27. The reader may also be curious about infinite-dimensional 
vector spaces. Such vector spaces certainly exist and are important, although 
we are avoiding them in our results (such as Theorem 13.24). It turns out 
that even in the infinite-dimensional case, two bases of a given vector space 
will have the same infinite size (i.e., there exists a bijection from one to the 
other), and so Definition 13.25 still makes sense. 


Remark 13.28. A natural question is, Does every vector space have a basis? In 
practice, the vector spaces we shall study will all have bases, but it is important 
to justify any assertion that a particular vector space has a basis. In general, 
the Axiom of Choice implies that any vector space has a basis—if we are willing 
to accept that axiom. At any rate, when we write “dimp(V) = n < co,” we are 
asserting that V has a finite basis over F—and thus, justification is required. 


Example 13.29. Looking back at Example 13.17, we found that the set 
{(1,0), (0,1)} is a basis for R? over R (under componentwise operations). 
This basis has size 2, so we have dimg(R?) = 2. No matter what other ba- 
sis of R? we come up with, that basis must also have size 2. We say that 
R? is a two-dimensional vector space over R. This agrees with our geometric 
understanding of “dimension” as the number of independent coordinates we 
need in order to describe an element of a space. In general, the dimension of 
a vector space is the number of basis elements, and each basis element acts 
like a coordinate. 


The following result says that a basis of a subspace can always be enlarged 
to form a basis of the whole space; this ensures that smaller vector spaces 
have smaller dimensions. See Exercise 13.8 for a strengthening of this result 
where we drop the hypothesis that dimp(W) < oo. 


136 Vector Spaces 


Lemma 13.30. Let W < V be vector spaces over a field F. Suppose that 
k = dimr(W) < oo and that B = {b1,..., bx} ts a basis for W over F. Also 
suppose dimr(V) < co. Then B is contained in some basis of V over F’, and 
consequently we have dimr(W) < dimpr(V). 


Proof. Let n = dimp(V), and let B = {81,..., Bn} be a basis for V over F. 
We induct on the quantity d:= |B al Wi. 

Base Case: d = 0. Then B C W, so (B) < W; but also, (B) = V D W; so 
W = V, and B is already a basis for V over F. 

Inductive Step: Suppose that d > 1. Then i such that 6; ¢ W. Set 
B’ = BU {6;}, and let W’ = (B’) < V. By construction, B’ spans W’. 
Suppose that eae ajb;) + a8; = 0 with aj,a € F. If a £0, then we could 
solve for 8; and find that 8; € (B) = W, which is false. Therefore, a = 0. 
Since B is linearly independent over F’, we can conclude that a; = 0 for each 
j. This shows that B’ is linearly independent over F’. By Lemma 13.21, B’ is 
a basis of W’ over F’. Now we certainly have |B — w'| < d, so by inductive 
hypothesis, B’ is contained in some basis of V over F’. Since B C B’, we are 
done. 


13.4 Exercises 


Ezercise 13.1. Suppose that (R,+,-) is a ring with 1 and F is a field which 
is a subring of R. Prove that (R,+,-|rxR) is a vector space over F’. 


Exercise 13.2. Supply a proof of the Subspace Test, Proposition 13.9. 
Exercise 13.3. Prove Lemma 13.11. 
Exercise 13.4. Prove Lemma 13.12. 


Exercise 13.5. Let V be a vector space over F’. 

(a) Prove that any subset of a linearly independent set is linearly indepen- 
dent: that is, if S C V is linearly independent over F and T C S, then T is 
also linearly independent over F’. 

(b) Prove that any superset of a linearly dependent set is linearly depen- 
dent: that is, if S C V is linearly dependent over F and T > S, then T is also 
linearly dependent over F’. 


Exercise 13.6. Let V be a vector space over a field F’,, and let S C V. Prove 
that if S is linearly independent over F, then S is a basis of (S) over F’. 


Exercise 13.7. Prove that a basis (of a given vector space V) is the same thing 
as a maximal linearly independent subset of V. (Compare this to Theorem 
13.23.) 


Exercise 13.8 (A Subspace of a Finite-Dimensional Vector Space is Finite-Di- 
mensional). Let V be a vector space over a field F’, with dimr(V) =n < oo. 


Exercises 137 


(a) Prove that if S is a subset of V such that |S] > n, then S' is linearly 
dependent over F’. (Hint: Exercises 13.5 and 13.6 may be useful.) 

(b) Let W < V. Prove that W has a finite basis over F’, and that 
dimr(W) < dimp(V). 


Exercise 13.9. This exercise concerns a variation on Lemma 13.30. Prove or 
disprove that if W < V are finite-dimensional vector spaces over a field F, 
then any basis B of V over F' must contain a subset B’ which is a basis of W 
over F’, 


Exercise 13.10. Let V be a vector space over a field F’, and suppose that F is 
infinite. Use the following steps to prove that V cannot be written as a finite 
union of proper subspaces. 

(a) Assume that V = U"_,Vj, where V; < V, and n is minimal. Show that 
for all 7, we have Vj Z Uz; Vi. 

(b) Note that we must have n > 2, and for j € {1,2} choose a; € Vj; — 
Uizj Vi. Prove that dc1,co € F andi € {1,...,n} such that c; # cp and both 
a, +c a2 and a, + cea belong to Vj. 

(c) Prove that we have both a, € V; and az € V;, contradicting the choice 
of a; in part (b). 


In the remaining exercises, we make use of the following definitions. 


Definition 13.31. The notion corresponding to a “homomorphism” in the 
category of vector spaces is called a linear transformation. If V and W are 
vector spaces over the same field F’, then we define a linear transformation 
from V to W to be a function t : V — W such that 

(LT1) Va,b € V, t(a+ b) = t(a) + t(0), and 

(LT2) Va € V Vee F, t(c: a) =c- ta). 


Definition 13.32. The kernel of a linear transformation t :V — W is 
ker(t) :={aeV : t(a) =0w}. 


Definition 13.33. A linear transformation is called an isomorphism (of vec- 
tor spaces) if it is bijective, and an embedding if it is injective. We write V = W 
to mean that there is an isomorphism from V to W. 


Exercise 13.11. Prove that 
(i) The kernel of a linear transformation is a subspace of its domain, and 
(ii) The image of a linear transformation is a subspace of its codomain. 


Exercise 13.12. Let V be a vector space of finite dimension n over a field F. 
Prove that V = F”, where, as usual, F'” denotes the vector space with under- 
lying set equal to the n* Cartesian power of F under componentwise addition 
and scalar multiplication (see Example 13.4). Thus, a finite-dimensional vec- 
tor space over any given field is uniquely determined up to isomorphism by 
its dimension. 


138 Vector Spaces 


Exercise 13.13. Let t : V > W be a linear transformation, where V and W 
are vector spaces over a field F', and suppose that dimp(V) < co. Prove that 
dim r(ker(t)) + dimp(t(V)) = dimp(V). 

Exercise 13.14. Let U, V, and W be vector spaces over the same field F’, and 
suppose that 0 : U > V andt : V > W are linear transformations. Prove 
that 70 is also a linear transformation. 


Exercise 13.15. [Matrices and Linear Transformations, part 1] Let V and W 
be finite-dimensional vector spaces over a field F’, with dimr(V) = n and 
dimr(W) = m. Suppose that B = {bi,...,bn} and C = {c1,...,cm} are 
bases of V and W, respectively, over F’. Let J denote the set of all linear 
transformations from V to W, and let M,,,,(F') denote the set of all m by n 
matrices with entries in F; here, m is the number of rows, and n is the number 
of columns. Let t : V — W bea linear transformation. 

(a) Explain why for each integer 7 between 1 and n, we can write t(b;) = 
yo, Tijci for uniquely determined elements T;,; € F’. 

(b) Let T be the m by n matrix whose entry in row i and column j is T;,;. 
Prove that for all x € V, we have t(z) = D7)", yici, where x = DY, 5b; 
with x; € F, and y; = )7;_,Ti,;2;. Note: we say that T is the matrix of t 
with respect to the bases B and C. Also note that we are assuming that we 
know the ordering of the basis elements: thus, it would be more accurate to 
say that B = (b,,..., bn) is an ordered basis of V over F’, and similarly for C. 

(c) Prove that the map t +> T is a bijection from T to Mm n(F). 


Exercise 13.16. [Matrices and Linear Transformations, part 2] Suppose that U, 
V, and W are finite-dimensional vector spaces over a field F' with dimensions 
n,m, and @, respectively. Choose ordered bases of U, V, and W over F’.. Let 
t) : U>V andtg : VW be linear transformations. 

(a) Set t = tg ot,. Let M, N, and T be the matrices of t1, te, and t 
respectively, with respect to the given bases. Prove that we have 


Tij = 2 Nik + Mz,j (13.8) 
k=1 


for alli, j with 1<i<f@and1l<j<n. 

(b) Use Equation 13.8 to define matrix multiplication for arbitrary matri- 
ces M € My.n(F) and N € Me (fF): namely, define N- M = T. Prove that 
matrix multiplication is associative whenever the product of three matrices is 
defined, that is, when the number of columns of the first matrix is equal to 
the number of rows of the second matrix in each product. 


Exercise 13.17. [Matrices and Linear Transformations, part 3] Let V be a 
vector space over a field F', with dimp(V) = n < oo. Suppose that B is a 
basis of V over F. Let R be the set of all linear transformations from V to V. 
When we represent a linear transformation in R as a matrix, we will use the 
same basis B for both the domain and the codomain. 


Exercises 139 


(a) For t, u € R, define t + u to be the function s : V — V given by the 
formula s(a) = t(a) + u(a) for a € V. Prove that s € R. 

(b) Prove that (R,+,0°) is a ring with 1, where o, as usual, denotes com- 
position of functions. 

(c) Let t,u € R, and let s =t+u. Let S, T, and U be the matrices of s, 
t, and u, respectively, with respect to B. Prove that we have S;,; = Ti; + Ui; 
for all i and 7, where, as usual, for a matrix M, the notation M;,; denotes the 
entry of M in row 7 and column j. 


Remark 13.34. We usually denote M,,,(F') more simply as M,,(F’). Because of 
the bijective correspondence between linear transformations in R and matrices 
in M,(F), the set of all n x n matrices over F' forms a ring under matrix 
addition and matrix multiplication. By construction, we have M,,(F) = R. 
The multiplicative identity element of M,,(F’) is denoted I, or simply I, and 
is called the n x n identity matriz. 

The group of units of 14,,(F) is denoted GL,,(F’) and is called the general 
linear group of n x n matrices over F' (compare Example 11.7). 


Exercise 13.18. [Matrices and Linear Transformations, part 4] Let V be a 
vector space over a field F', with dimr(V) =n < ov. Let R be the ring of all 
linear transformations from V to V over F’. For an element a € F’,, define a 
function [tg : V > V by the formula a(t) =a-t fort EV. 

(a) Show that fq is a linear transformation. 

(b) Show that the matrix of fi, is the n x n matrix A with 


Aij = - ee 
0, ft), 


no matter what basis we choose for V. Note: such a matrix A is called a scalar 
matrix. 

(c) Prove that the center C(R) of R is the set S := {fr : a € F}. (See 
Exercise 11.18.) 

(d) Prove that the map a+> fg is a ring isomorphism from F’ to C(R). 

(e) Define a function A : Fx R— R by the formula A(a,t) = pa ot. 
Prove that (R,+, A) is an F-vector space, where + is the addition operation 
from Exercise 13.17. 

(f) Use the correspondence between matrices and linear transformations 
together with part (e) above to make M,,(F) into an F-vector space. Prove 
that for any matrix T € M,,(F’) and any a € F, the matrix aT can be obtained 
by multiplying every entry of T by a. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


14 


Polynomial Rings 


14.1 Polynomials Over a Commutative Ring 


Polynomials form a central topic of study from the very first courses of school 
algebra, immediately after the basics of arithmetic. What is a polynomial, 
though? 

Informally, we might say that a polynomial is a thing made out of powers 
of a variable (often called x) multiplied by various coefficients (which are 
numbers), and added together, as in 


14a? — 247 + 7. 


Even more important is the question: What can we do with polynomials? At 
the least, we can add and multiply polynomials. This suggests that the set of 
all polynomials may form a ring; and this is true. 

In our treatment of polynomials, we will not restrict the coefficients in 
polynomials to be numbers; instead, we will allow the coefficients to come 
from any fixed commutative ring with 1. In a sense, polynomials are the most 
“generic” things that can be added and multiplied commutatively; as a result, 
the subject of commutative rings is all about polynomials. 


Definition 14.1. Let R be a commutative ring with 1. The polynomial ring 
in the variable x with coefficients in R is the ring 


Ria] := {a9 +12 4+ agar? +---+an2" : n€N,a; € RB}, 
with addition and multiplication given by 
n m max{m,n} 
k=0 k=0 k=0 


and 


n m m+n k 
(>: at) . (>. no) = Se, Sais a. (14.2) 
k=O k=0 


k=0 \j=0 


In these formulas, we treat missing coefficients as zero: that is, as Or. In other 


DOI: 10.1201/9781003252139-14 141 


142 Polynomial Rings 


words, we identify Orv" with 0 R{x]- More simply, we will write Or* = 0. We 
also follow the convention of writing 1pxz* as x* and ax as a for a € R. 

Two polynomials are considered to be equal iff all their corresponding 
coefficients are equal: that is, 

n m 

ko ks _ ' 
So ane a So bp iff a, = by for all k st.0<k<max{m,n}. (14.3) 
k=0 k=0 


Definition 14.2. By a term of the polynomial yy a,av", we mean any of 
the individual expressions a,x". 


Remark 14.3. The reader should verify that the formulas defining polynomial 
addition and multiplication give the same results as the usual process for 
performing these operations on polynomials as taught in school algebra. In 
fact, the definitions of polynomial addition and multiplication are designed 
exactly so that polynomials follow the same rules of arithmetic as ordinary 
numbers: for example, x?- 2? = x°, «+2 = 22, and so on. These formulas are 
true whether «x is a variable in a polynomial ring or a number in R. 


Remark 14.4. We have R < R[x]. We can view R as the set of all “constant 
polynomials” in Riz}. 

Remark 14.5. As usual when we choose a variable name, the name 2 is not 
special here; we may speak of polynomials in other variables, and form poly- 
nomials rings such as R[t] or R[y]. The important thing is that our variable 
not already be in use (especially that it not be an element of RI). 


Remark 14.6. The polynomial ring R[z] can be thought of as the result of 
taking the ring R and adding a new element, x, which behaves in the most 
“free” way possible (while commuting with elements of R). Here, as in our 
discussion of free groups in Chapter 6, the meaning of free is relation-free. The 
way that this freeness manifests itself in the case of polynomials is packed into 
the definition of when two polynomials are equal, Equation 14.3. Specifically, 
when the right side of Equation 14.3 is 0, we have 


n 

SS axa* =0 iff a, =0 for allkst.0<k<n. (14.4) 

k=0 
This equation may be viewed as saying that there are no non-trivial algebraic 
relations among x and the elements of R. 
Remark 14.7. We have not managed to provide a very satisfactory notion of 
what kind of thing a polynomial really is, in terms of standard mathematical 
objects. Formally, we should define a polynomial with coefficients in R to be a 
function f : N— Rsuch that f(j) = 0 for all large enough values of 7 € N. 
Then f corresponds to the polynomial 5> ae (j)xJ. In practice, however, this 
formal treatment of polynomials is more clumsy than the familiar presentation 
recalled above. The reader is invited to verify all statements about polynomials 
from this formal point of view, now and in the future, while sticking with the 
familiar polynomial notation in actual work. 


Polynomials Over a Commutative Ring 143 


Lemma 14.8. If R is a commutative ring with 1, then so is R[a]. 


Proof. Let R be a commutative ring with 1. So far, we have not even estab- 
lished that R[a] is a ring. We outline the proof below. 

Let f,g,h € Riz]. Let n € N be the highest power of x which appears in 
any of the polynomials f, g, and h. By adding extra terms with a coefficient 
of 0 as needed, we can write all three polynomials in terms of the powers 
of x up to 2”. That is, we can write f = op 9 ez", g = Dopey Oev*, and 
h = peo cee” with az, be, ce € R. First, we see from Equation 14.1 that 
R{z] is closed under addition, since for ay,by € R we have ag + by € R. 
Associativity of R[az] under addition likewise follows from associativitiy of R 
under addition: we have (f+ 9)+h = op_o[(ax+bx) +cx)x” and f+(g+h) = 
DV peolak + (be + ce)|v*. The element 0p is an additive identity element for 
R[z], and the element >7/_9(—ax)x* is an additive inverse for }>;, ax2*. 
Commutativity of addition in R[2] follows from the same property in R. Thus, 
(R[z],-+) is an abelian group. 

We can see from Equation 14.2 that R[z] is closed under multiplication, 
since R is closed under multiplication and addition. To prove associativity 
and commutativity of multiplication in R[a], it will be helpful to rewrite the 
polynomial multiplication formula (Equation 14.2) in a more symmetric way: 


(>: at) (>. no) = a Sah |e" (14.5) 
k=0 k=0 


k=0 \itj=k 


Since we have >7,,5-,@i0j = Viyjondj@i = Lj yiny Ojai, it follows that 
multiplication in R[z] is commutative. For associativity, we compute 


(f-g)-h= S- SS aib; | a* | - (>: ae") (14.6) 
k=0 


k=0 \itj=k 


=, ye yi) aib; | cy | x (14.7) 


(14.8) 


I 
= 
o 
& 
ie) 
co 
8 


The computation of f -(g-h) yields the same expression. The proof of the 
distributive property in R[z] is left to the reader in Exercise 14.1. Finally, note 
that x° is a multiplicative identity for R[z]. 


Recall that the degree of a polynomial in x is the highest power of « which 
appears in the polynomial. We next define “degree” formally, for a polynomial 
with coefficients in any commutative ring with 1. 


144 Polynomial Rings 


Definition 14.9. Let R be a commutative ring with 1, and let f € R[x]—{0}. 
Write f = po azv* with n € N and a, € R. The degree of f in x is 


deg() = deg, (f) =max{k : ax £0}. 
We also define deg(0) = —oo. 


The usefulness of the degree concept as a measure of the “size” of a poly- 
nomial is illustrated in the following result. 


Lemma 14.10. Let R be a domain, and let f,g € R[x|—{0}. Then deg(f-g) = 
deg(f) + deg(g). In particular, R[x] is also a domain. 


Proof. We can write f = @m2™ +am_—10@™ 7! +--»+a,@2+ag and g = b)2" + 
by-1a"—! +--+ +b, x + bo for some m,n € N and aj,b; € R with an 4 0 
and b, 4 0. Thus we have f +g = dmbpx™t" + ( terms of smaller degree) . 
Since R is a domain and am,bn € R— {0}, we have a,b, 4 0. Therefore, 
deg(f +g) = m+n = deg(f) + deg(g), which proves the first statement. In 
particular, f-g 4 0. Together with the fact that 1p € R[a] is a multiplicative 
identity element for R[a], and 1p # Or = Ogja), this establishes that R[x] is a 
domain. 


Another, even easier, result tells us how degrees behave with respect to 
addition: 


Lemma 14.11. Let R be a commutative ring with 1, and let f,g € R[x]. Then 


deg(f+g) < max{deg(f), deg(g)}. Further, deg(f+g) = max{deg(f), deg(g) } 
if deg(f) # deg(g). 


Proof. Left to the reader as Exercise 14.3. 


One thing that we can’t do with polynomials is divide them by each other: 
at least, if we try to divide two polynomials, the result will in general not 
be a polynomial. The following result tells us exactly which polynomials have 
multiplicative inverses, over a domain. The answer is, only the constant poly- 
nomials which already had inverses in R. Again, the notion of degree is a key 
in the proof. 


Lemma 14.12. If R is a domain, then (R[ia])* = R*. 


Proof. Suppose that R is a domain. 

[Show (R[z])* C R* | Let f € (R[a])*. Then dg € R[x] such that f-g = 1. 
By Lemma 14.10, we have deg(f) + deg(g) = deg(1). But deg(1) = 0; and 
since f,g # 0, we have deg(f), deg(g) > 0. The only possibility now is that 
deg(f) = deg(g) = 0, and so f,g € R. Since f-g = 1 = 1p, we can say 
fe R*. 

[Show R* C (R[a])*] Let a € R*. Then 3b € R such that a-b = 1p. Since 
R< R[x] and 1p = 1p, this means that a € (R[z})*. 


Polynomials Over a Commutative Ring 145 


Besides doing addition and multiplication with polynomials, we can also 
evaluate a polynomial. For example, if f(x) = 2? + 3x — 1 € R[z], then we 
can evaluate f at x = 5 to get f(5) = 52 +3-5—1 = 39. We would like to 
generalize this idea to polynomials over an arbitrary commutative ring with 1. 


Definition 14.13. Let R and S be commutative rings with 1 such that R < S, 
and let a € S. If f € R[x] with 


f= cnt" + p12" 1 +--+ +09, 
then we define 
f(a) := cna” + Cp_10" 1 +--+ €S. 


We read f(a) as “f of a,” or “f evaluated at « = a.” In particular, since 
R< Ria] 3 x, we can consider f(x), and we see that f(x) = f. 


The reader may have noticed that we began using “function notation” 
in Definition 14.13, writing f(x) instead of just f. This allows us to use a 
consistent notation to denote both f and f(a). Stepping back a bit, we realize 
that evaluation of a polynomial is a way to turn a polynomial into a ring 
element. That is, evaluation at an element of S gives us a function from R[a] 
to S: 


Definition 14.14. Let Rand S be commutative rings with 1 such that R < S, 
and let a € S. The evaluation-at-a map is the function ¢, : R[a] > S given 
by 


for f € Riz]. 


Remark 14.15. In school algebra, the ability to evaluate a polynomial at a 
number allows us to produce a function from a polynomial. In fact, polyno- 
mials are often viewed in this light, and even confused with the correspond- 
ing functions. For example, if f(z) = 2? € R[z], then f gives a function 
of : R—-R via the formula of(a) = a? for a € R. However, we should 
be careful to distinguish the polynomial x? from the squaring function of. 
In general, we wish to make a distinction between a polynomial and the cor- 
responding evaluation function. Not only are they technically different kinds 
of object, but the correspondence between them may not be one-to-one: see 
Exercise 14.4. 

There is also a difference in point of view between our definition of the 
evaluation-at-a map and the school algebra view of a polynomial as a function. 
Namely, in school algebra, a polynomial f is fixed, while the element a (where 
f is to be evaluated) varies over the set of all real numbers. In Definition 14.14, 
it is @ which is fixed, while the polynomial f varies over the ring R[z]. 

Since €q is a function from one ring to another, our guiding principles tell 
us that if it is to be worth studying, €, should be a ring homomorphism. This 
is indeed the case: 


146 Polynomial Rings 


Proposition 14.16. Let R and S be commutative rings with 1 such that R < 
S, and let a € S. Then the evaluation-at-a map €o, is a ring homomorphism 
from Rx] to S. 


Proof. (Show Vf, 9 € Ria], eof +9) = ea(f) + ea(g) and ca(f +9) = eal f)- 
Ea(g); this means (f + g)(a) = f(a) + g(a) and (f- 9)(a) = f(a) - 9(@)] 

Let f,g € Riz]. Write f = ea a;x) and g = eet b;a with a;,b; € R 
(as previously noted, we can always do this even if f and g have different 
degrees, by adding terms to either f or g whose coefficients are 0). We have 
f(a) = yo aye? and g(a) = DYF_9 bja7. Now f + 9 = Yj_o(aj + 6;)27, so 
(F + 9)(@) = Ti 9(a; + bj)a! =O" g aja! +O" 9 dja? = f(a) + g(a). 

The proof that (f -g)(a) = f(a) - g(a) is likewise purely formal. The key 
point is that the formula defining the product of f and g, namely 


n n 2n 

pd an ‘ a 
) aja? | - ) bx? => ) ajby | x*, 
j=0 j=0 i=0 \j+k=i 


is the same as the formula for the corresponding product of ring elements: 


n n 2n 

ae Pa ee : a 
y aja y ba = y ajby | a’. 
j=0 j=0 i=0 \j+k=i 


Notation 14.17. Let e, : R[x] + S' be the evaluation-at-a map. We denote 
the image of €g by Ria]. That is, 


Rio] := €a(R[z]) = fealf) : fe R[z]}={f(a) : fe Rix]}}. (14.9) 
Remark 14.18. We can also write 
Ria] = {co + cia t cga? +--+ cna” : nEN,G € R}. 


Thus, Ria] is the “set of all polynomials in a with coefficients in R.” Even 
though a is an element of a ring S, and not necessarily a “variable,” we use 
the same square-bracket notation in both cases. A unified point of view will 


be presented in Chapter 15. 


Example 14.19. Let R= S=R, and let a=3 € R. Consider the evaluation- 
at-3 map €3 : R[z] > R, f(x) 4 f(3). For example, we have 3 : 27+7+ 
374+ 7=16. 

We claim that ¢3 is surjective. An easy way to see this is by considering 
the constant polynomials c € R < R{z]. Evaluating a constant polynomial at 
x = 3 has no effect on the constant, so €3(c) = c for allc Ee R. 

What is ker(€3)? We have f € ker(e3) iff f(3) = 0. Recall that if f(3) = 0, 
we say that 3 is a root of f. From school algebra, the reader may also recall the 


Polynomials Over a Field 147 


Root-Factor Theorem, which tells us that 3 is a root of f iff x—3 is a factor of 
f. (Later, we shall prove a version of the Root-Factor Theorem for ourselves, 
Theorem 14.24.) Thus, we are led to the conclusion that ker(€3) is the set of 
all multiples of « — 3 in R[z], which is just the principal ideal (a — 3). Now by 
the Fundamental Theorem of Ring Homomorphisms, Theorem 11.36, we can 
say R[a|/(a — 3) = R. We interpret this result as follows. By forcing x — 3 
to be zero in R[2], we collapse the ring R[z] into R by, in effect, substituting 
x = 3 in each polynomial in R[z]. 


14.2. Polynomials Over a Field 


As we might expect, since fields are the nicest type of ring, polynomial rings 
with coefficients from a field, or polynomial rings “over” a field (as we may 
call them) are the nicest type of polynomial rings. 

We are familiar with the notion of quotients and remainders from ordinary 
arithmetic. Namely, suppose that f,g € N with g 4 0. When we divide f by g, 
we can express the result as an integer plus a remainder term: more precisely, 
we can write 


f/ig=a+r/g9, (14.10) 


where g,r € N and 
O<r<g. (14.11) 


The situation with polynomials is not so nice in general. Lemma 14.12 tells 
us that (over a domain) the only polynomials we can “divide” by are those 
that are not “really” polynomials, but only constants: actually, only units of 
the coefficient ring. Nevertheless, when we are working with coefficients from 
a field, it is possible to effectively simulate the results of division, as the next 
theorem shows. By replacing integers with polynomials over a field, clearing 
away denominators in Equation 14.10, and replacing the bound on the size of 
the remainder in Equation 14.11 by a bound on the degree of the remainder, 
we arrive at the following (true!) result. 


Theorem 14.20 (Quotient-Remainder Theorem). Let F' be a field. Let f € 
Fx] and let g € Fla] — {0}. Then there exist unique polynomials q,r € F [2] 
such that f=q-g+r and deg(r) < deg(g). (It may be that r = 0, in which 
case we agree that deg(r) = —co < 0 < deg(g), and our result is still true in 
this case.) 


Proof. Let F be a field, f,g € F [2], g 4 0. [Show existence of qg and r] Let 
S={f-a-g : a€ Fla|}. Then we have S C Fa], and S is non-empty. Let 
m = min{deg(s) : s € S}. Assume for a contradiction that m > d := deg(q). 
Then ds € S s.t. deg(s) = m, and since s € S' we can write s = f —a-g for 


148 Polynomial Rings 
some a € Fa]. Now we have 
8 = Cm2™ + Cm_10™ 1 +--- +0 


and 


g = baa? + bg_ya?) +--+ + bo 


for some coefficients c;,b; € F with c,, and bg non-zero. [Idea: Perform 
one step of a “polynomial division” of s/g. The leading coefficient would be 
Cm2™ | (bax?) = bg emo" | Since F is a field and ba 4 Op, we have b;' € F, 


and we set h = b7*+Cm:2™—“. Note that m—d > 0 by assumption, so h € F[z]. 
Set §=s—h-g. Then we have 


§=s—h-g=(tm—- by tem -ba)z"™ + (terms of lower degree), 


from which we see that deg(S) < m. But =s—h-g=f—a-g-h-g= 
f-—(a+h)-g € S and deg(8) < deg(s), contradicting that s has the smallest 
degree of any element of S. This contradiction shows that m < d. Therefore, 
there exist g,r € F[a] such that f = q-g+7r and deg(r) < deg(g). 

[Show uniqueness of g and r] Now suppose that q,7r,¢,7 € Fa] are such 
that f = q-g+r = ¢-g+7 and deg(r), deg(7) < d. Then we have (q—q)-g = f—r. 
Assume for a contradiction that q 4 G. Then q — ¢ # 0, and since also g 4 0 
by hypothesis, we can apply Lemma 14.10 to get deg((q — g)- g) = deg(q — 
@) + deg(g). By Lemma 14.11, we have deg(#— 7) < d. Combining these facts, 
we have d > deg(* — r) = deg((q— @) - g) = deg(q — @) + deg(g) 2 deg(g) = d. 
This contradiction shows that q = q. 

Now since q:-g+r=4q-g+/f and q= 4, it follows that r = 7. This proves 
uniqueness. 


Our hunch that polynomial rings over a field ought to be well-behaved is 
made precise in the next result, which says that such rings are principal ideal 
domains. Before stating this result, we need one definition. 


Definition 14.21. A polynomial is monic if its leading coefficient is 1. That 
is, let R be a commutative ring with 1, and let f € R[x]—{0}, with deg(f) = d. 
Write f = cau? + cq_yut +--+ + ee + co, with c; € R. Then f is called 
monic if cg = 1. 


Theorem 14.22. Let F be a field, and let I be an ideal of F[x]. If I 4 (0), 
then there is a unique monic polynomial f € Fla] such that I = (f). In 
particular, F'|a] is a principal ideal domain. 


Proof. First, since every field is a domain (Proposition 11.14), F'[z] is a domain 
by Lemma 14.10. 

Suppose that J is an ideal of F[2] with I ¥ (0). [Idea: If I is really principal, 
generated by a polynomial f, then every polynomial in J is a multiple of f. 
Thus, f should have the smallest degree of any (non-zero) element of J.] Let 
m = min{deg(a) : a € I — {0}}. Then dg € I — {0} such that deg(g) = m. 


Polynomials Over a Field 149 


So we can write g = 7-0 cjz) with c; € F and cm # 0. Since F is a field, 
we have Cm € F*, and we set f = c;,'-g € F[a]. Observe that f is monic and 
deg(f) = deg(g). 

[Show that I = (f)] Let h € I. By the Quotient-Remainder Theorem 
(Theorem 14.20), there exist q,r € F'[x] such that deg(r) < deg(f) and h = 
q:ftr.Sor=h-—q-f. Since f,h € I and q € Fla], we have r € I. Now 
deg(r) < deg(f), but there is no non-zero polynomial in I whose degree is less 
than the degree of f. Therefore, r = 0. It follows that h = q- f € (f). Thus, 
IC (f). Conversely, since f € I, we have (f) C I. This shows that I = (f). 

[Show uniqueness of f] Suppose that f € Fx] is another monic polynomial 
such that I = (f). Then (f) = (f), so by Lemma 12.16, we have f = u- f for 
some u € (F[x])*. Now by Lemma 14.12, we can say u € F”. In particular, 
we have deg(u) = 0, so by Lemma 14.10, we have deg(f) = deg(f). Since f 
is monic of degree m, we can write f = 2™ + (terms of smaller degree). Since 
f=u-f andue F*, we have f = u-a2™ + (terms of smaller degree). But f 
is also monic, which forces u = 1. Therefore, a = f, as desired. 

Now we know that every non-zero ideal of Fx] is principal; since the zero 
ideal (0) is also principal, we have proved that F'[a] is a PID. 


After defining the term “root” in a suitably general manner, we will be 
ready to state and prove the Root-Factor Theorem for polynomials over an 
arbitrary field. 


Definition 14.23. Let R be a commutative ring with 1, and let f € R[z]. 
An element a € R is called a root of f if f(a) = 0. 


Theorem 14.24 (Root-Factor Theorem). Let F' be a field. Let f € F [x] and 
aé€F. Then a is a root of f iff (a—a)| f in Fla]. 


Proof. (=): Suppose that @ is a root of f. By the Quotient-Remainder The- 
orem (Theorem 14.20), there exist unique polynomials g,r € F[z] such that 


f=(@-a)-qtr (14.12) 


and deg(r) < 1. Since r has degree less than 1, we must have r € F. Evaluating 
both sides of Equation 14.12 at x = a, and using the fact that evaluation at 
a is a ring homomorphism (by Proposition 14.16), we get 


f(a) = (a—a)-¢(a) + r(a). (14.13) 


Now since r € F’, we have r(a@) = r. Since a is a root of f, we have f(a) = 0. 
Therefore, r = 0, so f =(x—a)-q, and (x— a) | f. 

(<): Suppose that («—a)| f in F[z]. This means that f = (a —a)-q for 
some q € F[a]. Evaluating at a, we find f(a) = 0, so a is a root of f. 


Next we state and prove a result familiar from school algebra: a polynomial 
of degree n can have at most n roots—if we are working over a field. But see 
Exercise 14.5 for a counterexample to this result over a ring which is not a 
domain. 


150 Polynomial Rings 


Theorem 14.25 (Root Bound on Polynomials). If f € Fa] is a non-zero 
polynomial over a field F, then the number of roots of f in F is at most the 


degree of f. 


Proof. Let F be a field, and let f € F[a]—{0}. Set Rp ={aeEF : f(a) =0}, 
the set of all roots of f in F. We proceed by induction on d := deg(f). 

Base Case: d = 0. 

Then f € F — {0}, so f is a non-zero constant polynomial, which has no 
roots in F. So Ry = 0 and |Ry| = 0 = deg(f). 

Inductive Step: Suppose that d > 1, and assume inductively that if g € 
F [a] — {0} and deg(g) < d, then |R,| < deg(g). 

If f has no roots in F, then we have |R¢| = 0 < d, so we are done. 
Therefore, we may assume that there exists a root a of f in F. 

By the Root-Factor Theorem (Theorem 14.24), we can write 


f=(t-a)-g (14.14) 


for some g € Fa]. Since f 4 0, we must also have g 4 0. Thus, Lemma 
14.10 applies to give deg(f) = deg(x — a) + deg(g). So d = 1+ deg(g) and 
deg(g) = d—1. By the inductive hypothesis, we have 


|| < deg(g) =d—1. (14.15) 


We claim that 
Ry = R, VU {a}. (14.16) 


To prove this claim, let 6 € Ry. Then we have f(8) = 0 = (8 — a) - g(f). 
Since F is a domain (by Proposition 11.14), we have 6 — a = 0 or g(8) = 0. 
Thus, either 6 = a or 8 € Ry. This shows that Ry C R, U {a}. Conversely, 
Equation 14.14 shows immediately that R, U{a} C Ry. 

Equations 14.15 and 14.16 together allow us to write |R;y| < |R,|+1< 
(d—1)+1=d, as desired. 


14.3. Exercises 


Exercise 14.1. (a) Prove the distributive property for polynomials over a com- 
mutative ring with 1 by using the formulas for the ring operations given in 
Definition 14.1. 

(b) Find an example of an abelian group (S,+) and an associative binary 
operation - on S with a subset T C S such that (TJ) = S, and such that for 
allt,u,v € T, we have t-(u+v) =t-u+t-vand (u+v)-t=u-t+u-t, but 
such that ($,+,-) is nota ring. That is, to check the distributive property for 
S, it is not enough to check it for a set which generates S under addition. 


Exercises 151 


Exercise 14.2. Let R be a commutative ring with 1. Prove the converse of the 
second part of Lemma 14.10: If R[z] is a domain, then R is a domain. 


Exercise 14.3. Prove Lemma 14.11. 


Exercise 14.4. Let R = Z/3Z, and let f(x) = 22-2 € Riz]. Leto : ROR 
be the function which corresponds to f via evaluation, namely o;(a) = f(a) 
for a € R. 

(a) Find the values o¢(a) for each a in R. 

(b) Find another polynomial g € R[z] such that g # f but a, = cf. 


Exercise 14.5. Find a commutative ring R with 1 such that the polynomial 
x? € R{a] has more than 2 distinct roots in R. Hint: Choose R to be the ring 
of integers modulo n, for an appropriate choice of n. 


Exercise 14.6. Let F be a field. Prove that every polynomial of degree 1 in 
Fz] is irreducible in F[a]. 
Exercise 14.7. Let F be a field and let f € Fla]. 

(a) Prove that if f is irreducible in F[a] and f has a root in F, then 
deg(f) = 1. 

(b) Suppose that deg(f) € {2,3}. Prove that f is irreducible in F[x] iff f 
does not have any roots in F’. 

(c) Find a counterexample to the biconditional in part (b) using a field F’ 
of your choice and a polynomial f of degree 4. 


Exercise 14.8. Let F' be a field and let V = Fa]. 

(a) Prove that V is a vector space over F' under the operations + and - of 
F'|x] (after restricting the latter operation to F x V). 

(b) Prove that the set B:= {x7 : 7 € N} isa basis of V over F; thus, V 
is an infinite-dimensional F-vector space. 


Exercise 14.9. Let F' be a field and let V = F'[z]. You may use the result of 
Exercise 14.8 here. 

(a) Let V,, be the set of all polynomials in V of degree at most n. Prove 
that V, < V and dimr(V,) =n+1. 

(b) Suppose that f € V and that f factors into linear terms as f = 
a-|]j-1(e@—1;) with a,ri,...,%m € F,a #40, and r1,...,rn distinct. For each 
ie {l,...,n}, let fi =a-[],4,(@ —1;). Prove that {fi,..., fn} is a basis of 
V,—-1 over F’. Hint: start with a linear dependence relation involving the f;, 
and then apply carefully chosen evaluation maps. Note: this result justifies 
the method of partial fractions in calculus to integrate a rational expression 
in the case when the denominator factors into distinct linear terms. 


Exercise 14.10. (a) Let a: R > S be a homomorphism between two commu- 
tative rings with 1 such that (1p) = 1g. Let a be an element of S. Prove that 
there is a unique ring homomorphism @ : Riz] > S such that o|z = o and 
a(x) =a. (Notice that this construction generalizes the evaluation mappings.) 

(b) Prove that if T shares the property of R[x] expressed in part (a) above, 
then T = R[z]. More precisely: 


152 Polynomial Rings 


Suppose that T is a commutative ring with 1 such that R < T. Suppose 
that a € T. Finally, suppose that for every commutative ring S with 1, for 
every element a € S, and for every ring homomorphism o0 : R—- S which 
maps lr to lg, there is a unique ring homomorphism @ : T — S which 
extends o and maps a to a. Prove that T = R[x] via an isomorphism sending 
a to x. 


Exercise 14.11. Let R be a commutative ring with 1. The polynomial ring 
over R in the variables x1, ..., &, is defined recursively as R[x1,...,2p] := 
(R[x1,.--,;£n—1]) [vn]. Prove that if S is a commutative ring with 1,o: RS 
is a ring homomorphism with o(1R) = 1g, and aj, ..., a, are elements of S, 
then there is a unique ring homomorphism @ : R[x1,...,2n] > S such that 
alr = o and o(2;) = a, for all j € {1,...,n}. Note: for a polynomial f € 
Rix,...,£p] it is natural to denote the image a(f) by f(ai,...,@), and to 
refer to f(a1,...,@n) as “f evaluated at (a1,...,@n).” We define Ria1,..., an] 
to be the image of o. We define {a1,...,an} to be algebraically independent 
over R (with respect to o) if & is injective; this is a generalization of the notion 
of a transcendental element, and is especially important when R < S$ and a 
is just the inclusion map. 

Exercise 14.12. Let R be a ring with 1 (not necessarily commutative), and let 
C = C(R) be the center of R (see Exercise 11.18). Let a € R. Prove that the 
set S := {eg t+ ca+ca?2+:--+cena" : neEN, c; € C} is a commutative ring 
with S < R. Thus we have S = Cla], the image of the evaluation-at-a map 
from Cla] to S. 


Exercise 14.13. Find a commutative ring R with 1 such that (R[z])* 4 R*. 
(Compare to Lemma 14.12.) 


Exercise 14.14 (Two-Thirds Rule for Factoring a Polynomial over a Field). 
Let F' < K be fields, and let f,g,h € K[a] — (0) be such that f = g-h. Prove 
that if any 2 of these 3 polynomials are in F'[a], then so is the third. 


Exercise 14.15 (Composition of Polynomials). Let R be a commutative ring 
with 1, and let S = R[x]. Observe that since R < S, then for any f € S we 
may consider the evaluation-at-f map e¢ : S — S which sends z to f. Define 
a binary operation o on S by the formula go f = e¢(g) for any f,g € S. 
We may refer to o as composition, since the standard notation for evaluation 
allows us to write go f = g(f). 

(a) Prove that o is associative. 

(b) Prove that x is an identity element for o. 

(c) Now suppose that R is a domain. Prove that if f,g € S — (0), then 
we have deg(g o f) = deg(g) - deg(f). Use this formula to help show that 
the elements of S which have inverses with respect to o are precisely the 
polynomials of degree 1. 


15 
Field Theory 


15.1 Extension Fields 


The theory of polynomial rings over a field is closely connected to the general 
theory of fields. We illustrate this connection in the following example. 


Example 15.1. Consider the evaluation-at-i map 
6; : Ria] 9 C 


which maps a polynomial f(a) with real coefficients to the complex number 
f(i). 

We claim that ¢; is surjective. To see this, consider the image under €; of 
a typical linear polynomial a + ba with a,b € R. We have ¢;(a+ bx) = a + bi, 
which is the general form of a complex number. 

Set K = ker(e;). Since ¢; is a ring homomorphism (by Proposition 14.16), 
we know that K is an ideal of R[z]. By Theorem 14.22, either K = (0) or 
there is a unique monic polynomial f(x) € R[x] such that K = (f); further, 
from the proof of that theorem, f would be a non-zero polynomial of smallest 
degree in Kk. 

We have f € K iff e;(f) =0 iff f(¢) = 0. In other words, to find elements 
of kK, we must seek polynomial relations satisfied by 7 with coefficients in R. 
The basic formula 

?=-1 


gives us such a polynomial relation. It can be rewritten as i? + 1 = 0, so we 
realize that g(i) = 0 where g(a) = 2? +1. Thus we have g € K; in particular, 
K # (0). Since deg(g) = 2, we should ask whether there are any non-zero 
polynomials of degree 1 in K. Well, if f(a) = a+ bx with a,b € R, then 
f(t) = a+bi, which is not zero unless a = b = 0. Therefore, g has the smallest 
degree of any non-zero polynomial in Kk. Note that g is monic, so in fact g is 
the generator of K described in Theorem 14.22. 

Now we know that ker(e;) = (g) = (2?+1). Since ¢; is surjective (as shown 
earlier in this example), the Fundamental Theorem of Ring Homomorphisms 
(Theorem 11.36) tells us that 


R[z]/(a? +1) =C. (15,1) 


DOT: 10.1201/9781003252139-15 153 


154 Field Theory 


Notice that everything on the left-hand side of (15.1) involves only real quan- 
tities. By starting with the field R, we were able to produce the field C, as 
follows: First, add in a new element, x, to R, to get R[z]. This x is a blank 
slate for our handiwork, because it does not have any algebraic relations with 
elements of R. Next, we force the equation x? = —1 to have a solution by 
forming the quotient ring R[z]/(z? + 1). In this quotient ring, let Z denote 
the image of x under the natural map 


Rs] > R[s]/(a? +1), fo f+(2?4+1). 


In other words, set Z = x+ (a? +1) € S := R[z]|/(x? + 1). Then we have 
Z? = —1g. 

If we were familiar with R but did not know about C, then we could 
actively seek to “extend” R by forming such a quotient ring. Desiring a new 
“number” 7 satisfying i? = —1, we would simply form the quotient ring C := 
R[z]/(x? +1), check that R <— C, and agree to use the notation i := «+ (a?+ 
1) € C. From this point of view, we need not fear 7 or label i dubiously as an 
“imaginary” number: instead, i is simply an element of a quotient of R[z]. 


Definition 15.2. Let F and Kk be fields. If F is a subring of K, 
F<K, 


then we also say that F is a subfield of K, or that K is an extension field of 
F’, We may also say that K is a superfield of F. 


Remark 15.3. Whether we use the term subfield or extension field depends on 
our point of view. If we start out knowing the bigger field K, then we usually 
refer to F as a subfield of K; if we start out only knowing the smaller field 
F,, and then discover or construct K, then we usually speak of K as being an 
extension field of F' instead of referring to F' as a subfield of K. 


If K is an extension field of F and a € K, then one important question we 
can ask is, what algebraic relations, if any, does a satisfy with respect to F’? 
For example, if a = 7 € C, then what algebraic relations does 7 satisfy with 
respect to R? From Example 15.1, we see that the answer to this question is 
tied up with the kernel of the evaluation-at-i map from R[z] to C. We next 
explore this relationship in general. 


Proposition 15.4. Let F and K be fields. Suppose that F is a subfield of K, 
and letaé K. Leteg : Fla] > K be the evaluation-at-a map. Then one of 
the following must be the case: 
(i) ker(€a) = (f) for a unique non-zero monic polynomial f € Fla]. In 
this case, f is irreducible in Fa], we have f(a) =0, and Fla] = F[a]/(f). 
(ti) ker(€q) = (0). In this case, a is not a root of any non-zero polynomial 
in F |x], and we have Fla] = Fiz]. 


Extension Fields 155 


Proof. Let I = ker(€q). First suppose that I # (0). By Proposition 14.16, 
Eq is a ring homomorphism. So by Theorem 11.36, I is an ideal of F'[a]. We 
know from Theorem 14.22 that there is a unique monic polynomial f € F[z] 
such that I = (f). We have f # 0 since I ¥ (0). Also by Theorem 11.36, we 
know that the image of ¢, is a subring of K; that is, Fla] < K. Now K is 
a field, hence also a domain (by Proposition 11.14); and Fla], since it is a 
subring of K containing 1, must also be a domain (by Exercise 11.11). The 
Fundamental Theorem of Ring Homomorphisms (Theorem 11.36 again—what 
a useful theorem!) now tells us that Fla] = F[a]/(f). By Theorem 12.8, we 
can say that (f) is a prime ideal of F[z]. We can use Exercise 12.12 to assert 
that f is a prime element of F'[a]. Since F[z] is a domain (for example, by 
Theorem 14.22), then by Exercise 12.11, we conclude that f is irreducible in 
Fx]. We have f(a) = 0 because f € ker(eq). 

Now suppose that J = (0). This means that for every polynomial f € 
F'|x|—(0), we have f(a) 4 0, as claimed. Theorem 11.36 gives Fa] & F'[a]/(0). 
But F[2]/(0) = F[a] (by Exercise 11.6). Therefore, F'|a] = F[z] (by Exercise 
11.12). 


The following definition attaches some names to the wealth of information 
contained in the outcomes of the two cases in Proposition 15.4. 


Definition 15.5. Let F and K be fields with F < K, and let ae K. 

Case (i): If a is a root of some non-zero polynomial in F'[a], then we say 
that a is algebraic over F. Let f be the unique non-zero monic polynomial 
which generates the kernel of the evaluation map €, : F[xz] > K in this case 
according to Proposition 15.4. Then f is called the irreducible polynomial for 
a over F’ in the variable x. We use the notation f = Irr(a, F, 2). 

Case (ii): If a is not a root of any non-zero polynomial in F'[x], then we 
say that a is transcendental over F. 


Definition 15.6. Let F and K be fields with F < K. If a is algebraic over 
F for all a in K, then we say that Kk is an algebraic extension of F’, or that 
K is algebraic over F. 


Remark 15.7. Now we can see how the square-brackets notation F[a], when 
a is an element of a ring containing F’, meshes with the notation F'[a] for a 
polynomial ring in the variable x. They actually amount to the same thing 
when a is transcendental over F. In other words, a transcendental element 
behaves like an independent variable! 


Example 15.8. Consider the element 2 € R. Since Q < R, we can ask 
whether V2 is algebraic over Q. The answer is yes, because (for instance) /2 
is a root of the non-zero polynomial x? — 2 € Q[z]. Notice that many other 
polynomials in Q[] also have 2 as a root, such as 24 —4 and #°+2?—22—2, 
but it is not hard to see that in fact 2? — 2 = Irr(/2,Q, 2). 


Example 15.9. We are not in a position yet to display fully supported examples 
of transcendental elements, but some attempt should be made. It can be shown 


156 Field Theory 


(but is not at all obvious) that the real number 7 is transcendental over Q. 
Thus, by Proposition 15.4, we have Q[z] =~ Q[a] via a map sending 7 to 2. 
This is much stronger than saying that 7 is irrational. In fact, 7 “transcends” 
any description by a finite amount of rational numbers: 7 not only cannot 
be expressed as a ratio of integers, but cannot be realized as the root of any 
non-zero polynomial with rational coefficients. 

As another example, we could take a variable x and try to show that x 
is transcendental over Q—as indeed it should be. The trouble here is that 
x would first have to live in some field containing Q. It turns out that 
the polynomial ring Q[z] naturally embeds in a field of “rational functions” 
Q(z) :={f/¢ : f,g € Q[z] and g 4 0}; see Chapter 20. With the field exten- 
sion Q < Q(z) in hand, we can say that indeed x is transcendental over Q. 

For an explicit construction of a real number which is transcendental over 
Q, see Exercise 15.17. 


In Proposition 15.4, we found that the kernel of any evaluation map €q 
from Fa] to K, where F is a subfield of K and a is algebraic over F’, is 
generated by an irreducible polynomial f € F[z] such that a is a root of f. 
We also saw that Fa] = F[]/(f) in this situation. 

Next, we take the opposite point of view: we start with a field F and a 
polynomial f € F'[a] such that f is irreducible in F'[a]. We would like to find 
an extension field K of F which contains a root of f. 

The idea here is to form the quotient ring Q = F'[a]/(f). In this ring, we 
have forced f to equal zero; more precisely, we would like to say that f(Z) = 0, 
where = x+(f) € Q. Thus, Z would be a root of f in Q. For the evaluation 
of f at x to make sense, @ should be a ring which contains F’, and if we 
were lucky enough to find that Q is actually a field containing F’, this would 
allow us to achieve our goal by setting kK = Q. This turns out to work, as the 
following result indicates. 


Theorem 15.10. Let F' be a field, and let f € Fla]. Suppose that f is irre- 
ducible in F [x]. Then: 

(1) (f) ts a maximal ideal of F[x). 

(2) Fla|/(f) =: K is a field which naturally contains F’. 

(3) The element a:=2+(f) € K is a root of f in Kk. 

(4) dimp(K) = deg(f). 


Proof. Set R= F{]. 

(1) First, we have f ¢ R* by definition of irreducible, so (f) is a proper 
ideal of R by Exercise 12.6. Suppose that (f) C I C R, for some ideal I of 
R. [Show I = R] Since R is a PID (by Theorem 14.22), dg € R such that 
I = (g) = Rg. Now f € (f) C Rg, so Sh € R such that f = hg. Since f is 
irreducible in R, either h € RX or g € R*. Ifh € R*, then (f) = (g) =I 
by Lemma 12.16, a contradiction. Therefore, g € R*, so I = (g) = R (by 
Exercise 12.6 again). This shows that (f) is a maximal ideal of R. 


Extension Fields 157 


(2) Since (f) is maximal, Theorem 12.21 tells us that R/(f) is a field. Now 
consider the map 
w: FK, cryct(f). 


This is just the natural map Fa] > K restricted to F, so w is a ring homo- 
morphism. Now ker(w) is an ideal of F' (by Theorem 11.36), hence is either 
(0) or F (by Exercise 11.7). In case ker(w) = F’, we would have w(1r) = 0x, 
so lp +(f) = 0r+(f), hence 1p = lr € (f); so (f) = R, which is impos- 
sible since maximal ideals are proper. Therefore, ker(w) = (0). By Exercise 
11.23, w is an embedding. Since F — K via the natural map w, we regard 
F as a subfield of K, and write F < K. (This is a slight abuse of notation, 
since actually w(F) < K; but such slurring of distinctions becomes one of 
the most important coping mechanisms for the advanced algebraist who must 
deal with many technically distinct but naturally isomorphic structures at the 
same time!) 

(3) Again, we regard F' as a subfield of K, by identifying c € F with 
c+(f) € K. Thus the evaluation-at-a map €, : Fx] — K makes sense. We 
have, for any g € Fa], the equality 


g(a + (f)) = g(x) + (f). (15.2) 


(See Exercise 15.3 for a detailed justification of this equation.) We have f(x) € 
Fz], so we compute €a(f) = f(a) = f(w+(f)) = fla) + (Ff) =0+(f) = 0K, 
since certainly f(x) € (f). 

(4) Let d = deg(f). From the proof of part (2) above, we know that d> 1. 
We claim that the set 


B:={1+(f),c+(f),0? +(f),-..,.c%' +(f)} 


is a basis for K over F. 

[Show that B spans K over F’] Let a € K. Then dg € R such that a = g+ 
(f). By Theorem 14.20, there are unique elements g,r € Rsuch that g = q-f+r 
and deg(r) < d. Now q- f € (f), so we have g+(f) =r+(f) by Exercise 


8.2. Since deg(r) < d, we can write r = as cj for unique elements c; € 


F. Now we have a = r+(f) = (ss a) +(f) = ss o (cj) + (f)) 
= = 9 cj (x? +(f)) € (B), the vector space spanned by B over F. Since a 
was an arbitrary element of K, we have shown that (B) = K. 


[Show that 0x has a unique representation as an F-linear combination of 
elements os B] Finally, suppose that c; € F for j between 0 and d— 1, and 
that pe 3S - (a + (f)) = Ox. Then we have r+ (f) = 0x = Or + (f), 
where r = So¢7 9 Cj). Therefore, r € (f). So r = fg for some g € R. But 
deg(r) < fo f). It follows that g = 0 and r = 0. So ¢; = 0 for all i This 
argument also shows that the elements b; := x’ + (f) for j € {0,. —1} 
are distinct, since if b; = b; with 7 4 j, then choosing c¢; = 1, cj = i ne all 
other cz to be 0 would result in the contradiction c; = cj = 0. 

By Lemma 13.21, B is a basis for K over F’. Thus, dimr(K) = |B = d. 


158 Field Theory 


Although Theorem 15.10 may appear highly technical, it has a corollary 
which gets right to the heart of field theory: every non-constant polynomial 
over a field has a root in some extension field. Before proving this corollary, 
we need a lemma on factoring polynomials over a field. 


Lemma 15.11. Let F be a field, and let f € Flax] — F be a non-constant 
polynomial. Then f factors into irreducible polynomials: that is, we can write 


ieee (15.3) 
j=l 


for some n € Z* where each f; is irreducible in F{z]. 


Proof. We proceed by induction on d := deg(f). 

Base Case: d = 1. In this case, f itself is irreducible, by Exercise 14.6. So 
we may take n = 1 and fi = f. 

Inductive Step: Suppose that the assertion of the lemma is true for all 
polynomials of degree less than deg(f). If f is irreducible, then we are done. So 
suppose that f is not irreducible in F'[a]. This means that we can write f = g-h 
for some g,h € F[x] where neither g nor fh is a unit in Faz]. Thus, g,h ¢ F, 
so deg(g) > 1 and deg(h) > 1. Since deg(f) = deg(g) + deg(h) (by Lemma 
14.10), we have deg(g) < deg(f) and deg(h) < deg(f). Thus our inductive 
hypothesis applies to g and h, so both g and h may be factored into products 
of irreducible polynomials in F'[a]. Multiplying two such factorizations yields 
the desired factorization of f. 


Corollary 15.12 (Existence of Roots). Let F be a field, and let f € F[x|—F 
be a non-constant polynomial. Then there exists an extension field K of F 
such that f has a root in K. 


Proof. By Lemma 15.11, there exists a polynomial g € F[a] such that g is 
irreducible in F'[z] and g| f. By Theorem 15.10, the field K = F'[a]/(g) is an 
extension field of F' in which g has a root. But since g is a factor of f, every 
root of g is also a root of f; so f has a root in K. 


Example 15.13 (Finite Fields). Suppose that F is a finite field; that is, F is a 
field and |F'| < co. We have already seen how to get our hands on such things: 
namely, take any prime p € Z; then by Corollary 12.25, Z/pZ is a finite field, 
which has order p. Are there any finite fields whose order is not prime? 

It turns out that there are. Our idea is to find an irreducible polynomial 
f € Fa] of some degree d. Then we can find an extension field K of F in which 
f has a root: namely, we can let K = F[x]/(f). By Theorem 15.10, we will have 
dimr (A) = d. Suppose that B = {b),...,ba} is a basis for K over F. Then 
every element a € K has a unique representation as a = a,b, +---+agbq with 
a; € F. Since the number of distinct d-tuples (a1,...,aa) € F% is |F4| =|F\", 


it follows that |K| = |F|*. 


Splitting Fields 159 


The upshot of this discussion is that if we can find a polynomial f € F[z] 
which is irreducible in Fa] and has degree d, then we can find an extension 
field K of F such that |K| = |F|*. In this way, we may hope to find a field 
whose order is a power of a prime. 

Let us compute a particular example. Let F = Z/(2), a finite field of order 
2. Let us look for an irreducible polynomial in F'[a] of degree 2. There are 
exactly 4 polynomials of degree 2 in Fz]; three of these factor non-trivially: 
a? = 2-2; 27 +1 = (x +1)-(x+1) (remember that the coefficients of 
these polynomials are integers modulo two!); 2? +z = x- (a +1). This leaves 
x*+2+1=: f, which is irreducible in F[z] by Exercise 14.7, since f(0r) = 
lr = f(1r). So let K = F[x]/(f). Then K is a field of order 2? = 4. 

By the proof of Theorem 15.10, the set B = {lr + (f),a+(f)} is a basis 
for K over F. Set p = «+ (f) and write 1 = 1p +(f) (since 1p + (f) is, 
after all, the multiplicative identity element of A’). Then we have B = {1, p}. 
The four elements of K are the elements a, -1-+a2-p where a; € F. Since 
F = {0r,1r}, we find kK = {0,1,9,1+ p}. By Theorem 15.10, we know that 
p is a root of f, which means that 


po +pt+1=0. (15.4) 


We claim that all arithmetic in K is “modulo two.” For example, we have 
pt+p=1r-ptlr-p=(1r+1r)-p=0r-p =0. Even more to the point, we 
have -1 = -lx =—-lp+(f) =1r+(f) =1lK =1in K. This fact together 
with Equation 15.4 determines the structure of K completely. For example, 
we compute p- p= p? = —(p+ 1) =p +1 (since —1 = 1 in K). In Exercise 
15.4, the reader is asked to complete the addition and multiplication tables 
for the field K. 


15.2 Splitting Fields 


We can take the idea of Corollary 15.12 one step farther, by showing that every 
non-constant polynomial over a field F' has all its roots in some appropriate 
extension field of F’. First we make a related definition. 


Definition 15.14. Let F' be a subfield of K, and let f € Fa]. We say that 
f factors completely over K or splits (completely) over K if we can write 


foe: 


a 


(x — pi) (15.5) 
=1 
for some c € F and some elements p; € K. 


We note that in Equation 15.5, if f 4 0, then we must have n = deg(f). 
Also, each p; is a root of f in K; and the leading term of f must be cx”. 


160 Field Theory 


Corollary 15.15. Let F be a field, and let f € Fx]. Then there is an exten- 
sion field K of F such that f factors completely over K. 


Proof. We proceed by induction on d := deg(f). 

Base Case: d < 0. Then f € F, so we may take K = F and write f =, 
which has the form of Equation 15.5 with n = 0. 

Inductive step: Suppose that f € F[a] and deg(f) > 1. Suppose inductively 
that for every field F and every polynomial in Fz] of degree less than deg(f), 
there is an extension field of F over which that polynomial factors completely. 

By Corollary 15.12, there is an extension field L of F such that f has a root 
p © L. By the Root-Factor Theorem (Theorem 14.24), there exists g € L[x 
such that f = (« — p)-g. Now we must have deg(g) = deg(f) — 1 by Lemma 
14.10, so by inductive hypothesis, there is an extension field K of LE such that 
g factors completely over K. It follows that f factors completely over K. 


Now that we know that every polynomial over a field factors completely 
in some extension of that field, it is natural to ask for a “smallest” extension 
where a given polynomial factors completely. That is the aim of the following 
definition. 


Definition 15.16. Let F be a field, and let f € Fa]. An extension field kK 
of F' is called a splitting field for f over F if 

(SF1) f factors completely over K, and 

(SF2) K is minimal with respect to property (SF1): that is, there is no 
field Z such that f factors completely over LD and F<L< K. 


We do not yet have any guarantee that a splitting field exists in general. 
Let us try to see how we could construct a splitting field. 

Suppose that F' is a field and f € F[a] — F’. Let E be an extension field of 
F such that f factors completely in E. Let p1,..., Pn be the roots of f in E. 
Suppose that K is a splitting field for f over F, with K < E. Then f must 
factor completely in Kk. So it may seem reasonable that K should at least 
contain all of the roots p; of f. Thus, our idea is to start with F and form the 
ring of all “polynomials in 1, ..., Pn with coefficients from F’,” which we will 
denote F'[pi,..., Pn]. 

In order to make this notation precise, we introduce two definitions. First, 
as with groups and ideals which can be generated by one element, we have 
a term to describe extension fields which can be generated by throwing in a 
single “new” element: 


Definition 15.17. Let F < Kk be fields. We say that K is a simple extension 
of F if da € K such that K = Fla}. 


Then we proceed to the case of several elements: 


Definition 15.18. Let F < K be fields, and let p1, ..., pn be elements of Kk, 
wtih n € N. Then we recursively define F'[p1,..., Pn] = (F'[p1,---; Pn—1])[Pn]- 
We read Flpi,..., Pn] as “F adjoin pi, ..., Pn’ When n = 0, we define 
Flpi,---5 Pn] = F. 


Splitting Fields 161 


One nice feature of this construction is given by: 


Lemma 15.19. Let F < K be fields, and let pi, ..., pn be elements of K. 
Let R = Flpyi,..., Pn]. Then R is the smallest subring of K which contains 
both the field F and all of the p;. 


Proof. R certainly contains F and the p;. On the other hand, any ring which 
contains F and the p; must contain all polynomials in the p; with coefficients 
in F,, by closure of a ring under addition and multiplication. 


In Lemma 15.19, suppose that the p; are all of the roots of a polynomial 
f € Fla], as before. Then we may expect that the ring R = F[py,..., Pn] is at 
least a “splitting ring” for f over F’,, if not a splitting field. But latent in our 
previous results is the amazing fact that, in this situation, R must already be 


a field! 


Proposition 15.20. Let F < K be fields, and let p,, ..., pn € K. Suppose 
that the p; are algebraic over F. Then Flp1,..., Pn] ts a field; it is the smallest 
extension field of F in K which contains all the pj. 


Proof. In light of Lemma 15.19, we need only show that F[pi,..., pn] is a 
field. We proceed by induction on n. We take our base case to be n = 0. In 
the base case, we have F'[p1,..., Pn] = F', which is a field by hypothesis. 

For the inductive step, suppose that L := F[p1,..., Pn] is a field, and that 
Pn+1 =: a is an element of K which is algebraic over F’. By Exercise 15.1, a 
must also be algebraic over L. Set f = Irr(a, L, 2). 

By Proposition 15.4, we have Lia] = L[z]/(f). Since f is irreducible over 
L|x], Theorem 15.10 says that L[a]/(f) is a field, as desired. 


Theorem 15.21. Let F < K be fields, and let f € F[a] be a non-zero poly- 
nomial which factors completely over K as 


fe Ile — pi) (15.6) 


withe € F and p; € K. Then the ring L := F[pi,..., Pn] is the unique splitting 
field for f in Kk. 


Proof. We observe that the p; are algebraic over F’, since they are roots of the 
non-zero polynomial f € F[zx]. Therefore, by Proposition 15.20, L is a field. 
Now f factors completely over L, since pi, ..., Pn € L. Next, suppose that b 
is another field such that f factors completely over Land F <L< K. It will 
suffice to show that L D L. Since f factors completely over iS we can write 


fr=as Nice (15.7) 


i=l 


with a € F and w; € L. Pick one of the p;, and observe that, on the one hand, 


162 Field Theory 


we have f(p;) = 0 (from Equation 15.6), while on the other hand, we have 
f(o;) = @- TT, (p; — #:) (from Equation 15.7). Therefore, 


a: 


(pj — wi) = 0. (15.8) 
i=1 
Now all the terms in Equation 15.8 belong to K, which is a field, and thus 
a domain. Since f ¢ F’, we have a # 0. Therefore, we must have p; = w; for 
some 7. This shows that p; € L. Since 7 was arbitrary, we must have i, ..., 
fn € L. By Proposition 15.20, we can say that L D L. This completes the 


proof. 


Corollary 15.22. Let f € Fla], where F is a field. Then there exists a 
splitting field K for f over F. 


Proof. If f € F, then we may take kK = F’. Otherwise, we combine Corollary 
15.15 and Theorem 15.21 to get our result. 


Example 15.23. Here is a tiny example to illustrate the preceding notions. 
Set F = Q, f(x) = x? —2 € Fla], and K = R. Then f factors in R[z] as 
f =(a—v2)(x+ V2). Therefore, Q/V2, — V2] is a splitting field for f over Q. 
Notice that we don’t need to include —/2 here, since -J2e Qiv2]. Thus, 
Q/V2] is already a splitting field for f over Q. 


Suppose that F is a field and pj, ..., py are the roots of an irreducible 
polynomial f € F'[a] in some extension field K of F. We can form the splitting 
field F'[p1,..., Pn] of f over F in a series of steps (as it was originally defined): 


Fp, +++. Px] 


Fp1, p2] (15.9) 


We refer to Diagram 15.9, naturally enough, as a tower of fields. (Sometimes, 
we may speak of a series of field extensions F, < Fy <--: < F,, as a “tower” 
even when it is displayed horizontally.) 

Recall that whenever we have an extension of fields LD < kK, we may view 
K as a vector space over L, and thus speak of the dimension dim,(K). This 


Splitting Fields 163 


dimension is one of the most important invariants of a field extension. Befitting 
its importance, this invariant has more than one name and notation: 


Definition 15.24. If L < K are fields, then the index of L in K, also called 
the degree of K over L, is the integer [K : L] := dim,(K). We say that K is 
a finite extension of L if [K : L] < o. 


Now set F; = F[pi,..., pi] for « € {0,1,...,n}. (We take Fp = F.) Note 
that p; is algebraic over F;_1, since p; is algebraic over F and F < Fj_,. 
Set f; = Irr(p;, F;-1,2) € Fi_i[a]. By Proposition 15.4, we have F;_1[p;] = 
F;_,[x]/(fi;). By Theorem 15.10, we then have [F; : F;_;] = deg(f;) if 1 <i< 
n. This last equation is useful enough to single out as a general lemma: 


Lemma 15.25. Let L < K be fields, let a € K, and suppose that a is 
algebraic over L. Let d = deg(Irr(a,L,x)). Then {1,a,a?,...,a%1} is a 
basis of L[a] over L, and [Lia] : L] = d. 


Proof. Exercise 15.8. 


How is the overall degree [F,, : F] related to the intermediate degrees 
[F; : F;-1]? The answer is given by the following lemma, which says that 
degrees of field extensions multiply. 


Lemma 15.26. If F< L< K are fields and the degrees [K : L| and [L: F] 
are finite, then [K : F] is also finite, and we have 


[K: F] =[K:L]-[L: Fl. (15.10) 


Proof. Let m = [L : F] and n = [K : L]. Then there are bases A = 
{a1,...,@m} of L over F and B = {by,...,b,} of K over L. The idea is 
to show that the set C:= {ajb; : 1<i<m,1<j <n} isa basis of K over 
F. 

Let a € K. Since B is a basis of K over L, there exist unique elements 71, 
..+; @n € L such that a = i x;b;. Since A is a basis of L over F’, for each 
j there exist unique elements w1,;, ..., Wm,j € F such that 7; = ey Wij Qi- 
— we can write a = D7)", )04_, wijaibj. This shows that C spans K over 

Next, suppose that we have 0 = >7; , wi,jaibj with wij € F. We can 
rewrite this as 0 = )>y", (74 wi.gbj) aj. Since A is a basis of L over F, 
Lemma 13.21 implies that Si wi,jb; = 0 for each 7. But since B is a basis 
of Kk over L, the same lemma gives w;,; = 0 for all 7 and 7. 

Our final point in this proof is that |C| = mn; that is, the elements a,b; 
are distinct for distinct values of 7 and j. For if we had ajb; = agby with 
(i,7) A (q, 7), then we would have 0 = 1p-a;b; +(—1r)-aqb,, which contradicts 
the result obtained in the previous paragraph regarding the uniqueness of 
representation of 0. Using Lemma 13.21 in the backwards direction, we deduce 
that C is a basis of K over F’. We can now say that [K : F] = |C| = mn, as 
desired. 


164 Field Theory 


Example 15.27. Let F = Q be our “ground field,” as we sometimes refer to the 
smallest field in a tower. Let f(x) = 2+—2 € Q[z]. Then we can factor f in C[2] 
as follows: f(x) = (a? —V/2)-(a?4+./2) = («— ¥2)(x2+ W2)(a—-iW2)(x+iv2). 
Thus, f factors completely in C[z]. The set of all roots of f in C is Rp = 
{+72, +172} = {*. V2 : k=0,1,2,3}. Let K = Q[V2, -—V2, 172, -iV/]. 
By Theorem 15.21, K is the unique splitting field for f in C. 

Set L = Q[W2]. By Proposition 15.20, L is a field, since ¥/2 is algebraic 
over Q; moreover, this proposition tells us that D is the smallest field which 
contains both Q and v/2. Since the real field R contains both Q and W2, we 
must have L CR. 

We note that —W/2 € Q[W2] since —1 € Q. Similarly, —iW2 € Q[W2, iV]. 
Therefore, Q[V/2, i¥/2] = K. Consider the tower of field extensions 


Q [2,179] =K 


Ql] =z 
| 
Q=F 


We claim that f = Irr(7/2,Q, x). We have f € Q[z], f is monic, and f(/2) = 
0. Assume for a contradiction that f factors in Q[z] as f = g-h, where 
g,h € Q[x] — Q. We may assume (after multiplying g and h by suitable non- 
zero rational numbers) that g and h are monic. By the Root-Factor Theorem 
(Theorem 14.24), neither g nor h can be linear, since f has no roots in Q. 
Therefore, deg(g) = deg(h) = 2. The four distinct roots of f in C must be 
roots of g or h. Since g and h can have at most two distinct roots each (by 
Theorem 14.25), we must have g(p1) = g(p2) = 0 for two distinct roots p; 
and p2 of f in C. This forces g = (x — p1)(a — p2) (see Exercise 15.11). Since 
g € Q|a], we must have pi p2 € Q. Looking at the roots of f in C, we see that 
the product p ,p2 of two of them must be V/2 or +i 2, none of which are in 
Q. This contradiction shows that f is irreducible in Q[z]. Thus, by Exercise 
15.2, f = Inr(V2,Q,~2), as claimed. From Lemma 15.25 it now follows that 
[L : Q\=[Q{¥2] : Q] = deg(f) = 4. 

Since K ZR but L C R, we must have K 4 L. By Exercise 15.5, this 
forces [K : L] > 1. On the other hand, i¥/2 is a root of the polynomial 
a? + /2 € L{a]. Therefore, deg(Irr(i¥2, L,x)) < 2, and so [K : L] < 2. It 
follows that [A : L] =2. By Lemma 15.26, we have [K : Q] = 8. 

Remark 15.28. Exercise 15.2 tells us that f is the irreducible polynomial 
over Q for any of the roots of f, not just for \/2. Thus, for instance, we 
have f = Irr(iv/2,Q,2). It follows that [Qliw/2] : Q) = 4. Yet notice that 
[L[iv2|:L] = [K : L] = 2. In general, adding an algebraic quantity to 
an extension field carries less of a penalty in terms of degree than adding 
that same quantity to the ground field; this is because a polynomial which 


Exercises 165 


is irreducible over the ground field may factor non-trivially over an extension 
field. Further, this phenomenon always occurs when we add successive roots 
of the same irreducible polynomial to a given ground field; for by the Root- 
Factor Theorem, we are peeling off at least one linear factor every time we 
add a root to an extension field. For details, see Exercise 15.10. 


Remark 15.29. The hardest part of Example 15.27 was showing that the poly- 
nomial f is irreducible in Q[z]. In general, it can be difficult to decide whether 
a given polynomial is irreducible over a given field; there is, however, a test 
for irreducibility called Fisenstein’s criterion, which applies in this example 
(see Lemma 21.5). 


15.3. Exercises 


Exercise 15.1. Suppose that F < L < K is a tower of field extensions, a € K, 
and a is algebraic over F’. Prove that a is also algebraic over L. 
Exercise 15.2. Let F < K be fields and let a € K. Suppose that f(x) € F[a] 
is a non-zero monic polynomial, irreducible in F[z], and that f(a) = 0. Prove 
that f = Irr(a, F, 2). 
Exercise 15.3. Assume the hypotheses and notation of Theorem 15.10. Justify 
the following statements involving computations in K: 

(a) (w@ + (f))? = 2? + (f). —— 

(b) For all positive integers 7, we have (x + (f))? = a7 + (f). 

(c) For alla € F and all positive integers j, we have a-(x/+(f)) = ax? +(f). 
(Hint: Recall how we are identifying F with a certain subfield of K.) 

(d) By now you should find it straightforward to justify Equation 15.2; do 
this. 
Exercise 15.4. Compute the addition and multiplication tables for the finite 
field K of order 4 described in Example 15.13. 


Exercise 15.5. Let F< K be fields. Prove that [kK : F])=1liff K =F. 
Exercise 15.6 (Finite Implies Algebraic). Let F' < K be fields with [kK : F] < 
oo. Prove that K is algebraic over F’. 

Exercise 15.7. Let Kk be a finite extension field of F’. Prove that there exist 
finitely many elements a1, ..., @, of K such that K = Flaj,...,Qn]. 
Exercise 15.8. Prove Lemma 15.25. 

Exercise 15.9. Let F < L < K be fields such that [kK : F] is finite. Prove 
that both [K : L] and [L : F] are finite. (This allows us, for example, to 


use Lemma 15.26 under more general hypotheses than those stated in that 
Lemma.) 


166 Field Theory 


Exercise 15.10. Let F be a field, and let f € Fa], with deg(f) = n > 1. Write 
f=c-][ji,(@—-:p:) with ce F and p),..., pn in some splitting field K for f 
over F’, and set F; = F[pi,...,pi] fori: 0<i<n. Note that Fo = F (since 
we are adjoining no extra elements in that case) and F, = K (by Theorem 
15.21). Prove that we have [F; : F;_1] < n—i+1 for alli: 1 <i <n. Conclude 
that [K: F] < nl. 

Exercise 15.11. Let F be a field and f € F[z]. 

(a) Suppose that a and @ are distinct roots of f in F. Prove that (a — 
a)(x — 8) divides f in F [a]. 

(b) Generalize part (a) to the situation where aj, ..., a, are r distinct 
roots of f in F. (This result can be seen as a step towards proving a unique 
factorization theorem in the ring Fz]; we will examine such things more 
thoroughly in Section 20.3.) 


Exercise 15.12. Prove that in Example 15.27, the splitting field K in C for 
a* — 2 over Q can also be written as K = Q[W2, 7). 


Exercise 15.13. Let F < K be fields with [kK : F'] < co, and suppose that 
F is infinite and that there are only finitely many fields L between F and K. 
(This hypothesis may seem strange, but it occurs in many important cases; 
for an example, see the proof of Theorem 16.32, the Fundamental Theorem of 
Galois Theory.) Prove that K is a simple extension of F’. Hint: Exercise 13.10 
could be useful here. 


Exercise 15.14. Let F < K be fields with [kK : F] =r < ow, and suppose 
that F is finite, of order n. 

(a) Prove that |K| =n". 

(b) Prove that every non-zero element of K is a root of the polynomial 
f(a): 2") —1€ Fla). 

(c) Let a € kK — (0) and suppose that Fla] # K. Prove that there is a 
proper factor m of r such that a@ is a root of the polynomial f(z). 

(d) Using the crude estimate that the number of proper factors of r is at 
most 1/2, prove that da € K — (0) such that Fla] = K; that is, every finite 
extension of a finite field is simple. Hint: The inequality t(n* — 1) < n?¢—1 is 
true for real numbers t > 1 (show this!). 


Exercise 15.15. In this exercise, you will show that the field of complex num- 
bers C has no quadratic extensions (that is, extension fields of degree 2 over 
C). Essentially, the reason is that C already contains square roots of all its 
elements. 

(a) Let a € C. Let r = jal, the absolute value of a. Show that we can 
write a = r(cos(@) + isin(@)) for some real number @ € [0, 27). 

(b) Let 6 = /r(cos(0/2) + isin(@/2)). Show that 6? = a. 

(c) Prove that every monic quadratic polynomial f € C[2] factors non- 
trivially in C[2] as a product of linear polynomials. 

(d) Conclude that there is no field F such that [F : C] = 2. 


Exercise 15.16. We have seen (Example 15.8 combined with Lemma 15.25) 


Exercises 167 


that the index [Q[V2] : Q] is 2 as a field extension. But what is this index 
when Q and Q[V2] are considered as groups under addition? 


Exercise 15.17. In this exercise (which feels more like analysis than algebra), 
you will establish the existence of real numbers which are transcendental over 
Q. Let f € Q[z] be a polynomial of degree r > 0, and suppose that a is a 
root of f in R. 

(a) Let D € Z* be a common denominator for f : that is, D- f € Z[{z]. 
Show that if g = n/d € Q with n,d € Z, d #0, and f(q) 4 0, then we have 
If(@| > 1/(a"D). 

(b) Show that there is a constant M which depends only on f such that 
| f(a+06)| < M- |d| for every 6 € [-1, 1]. 

The idea now is to exploit the tension between the inequalities from parts 
(a) and (b) above, which go in opposite directions, by constructing a real 
number a that is absurdly close to—but not equal to—an entire sequence of 
rational numbers. To this end: 

(c) Set a = SOE, 279 '. Show (e.g. by comparison with a geometric series) 
that the series for a is convergent, and that Ja — az| < 2!~%+)!, where ay, is 
the kt partial sum for a, i.e., a, = a g-0, 

(d) Assume for a contradiction that a is algebraic over Q, and set f = 
Irr(a, Q, x). Note that a, € Q, and reach a contradiction by substituting a, 
for q in part (a) and a, — a for 6 in part (b). 


Exercise 15.18. Let F < K be a field extension. 

(a) Prove that if a € K and a is algebraic over F', then Fla] is a field 
which is algebraic over F’. 

(b) Prove that if a1,a2 € K and each a; is algebraic over F’, then F'[a1, a] 
is a field and is algebraic over F’. 

(c) Set L = {a € K | ais algebraic over F'}. Use part (b) above to prove 
that Lisa fieldandF<L< kK. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


16 


Galois Theory 


16.1 Field Embeddings 


Evariste Galois is credited with inventing the concept of a group, as well as 
the notions of normal subgroup, symmetric groups, and a host of other ideas 
central to abstract algebra. Yet even among these epochal inventions, the 
contribution of Galois to the theory of automorphisms of field extensions is 
so astonishingly beautiful that it was singled out to bear his name: Galois 
Theory. In this wonderful branch of mathematics, many of the key ideas of 
algebra come together to create a symphony of incomparable splendor. 

We shall see that automorphisms of fields are intimately related to roots 
of polynomials. Before we study automorphisms, however, we will step back 
to the more general case of field isomorphisms. Since we know that the only 
ideals of a field L are (0) and the field L itself (by Exercise 11.7), then the only 
possibilities for a ring homomorphism L — R are embeddings and the zero 
map; thus, we do not really lose anything by speaking of field embeddings. The 
first result in this chapter says that any field isomorphism naturally extends 
to an isomorphism of the corresponding polynomial rings. 


Lemma 16.1. Leto : L— M be an isomorphism of fields. Then o extends 
to an isomorphism o : Lia] > M[a], where 


n n 


a Seat =) alae. (16.1) 


j=0 j=0 


Proof. Since M < M[z], we can consider o to be an embedding ao : L > M{z]. 
By Exercise 14.10, there is a unique extension of o to a ring homomorphism 
ao: L[x] > M2] such that a(x) = «x. Exercise 16.1 asks the reader to complete 
the proof. 


Our approach to the study of automorphisms of fields is somewhat indirect. 
Given a field K, we wish to study Aut(K), the group of automorphisms of 
K. More generally, given a field extension F' < K, we wish to study those 
automorphisms of K which restrict to the identity map on F’; we call such 
things automorphisms of K over F. In even more generality, we define: 


DOT: 10.1201/9781003252139-16 169 


170 Galois Theory 


Definition 16.2. Let R, S, and T be commutative rings with 1 such that 
R< Sand R<T. A ring homomorphism o : S > T is said to be over R if 
o(a) =a for every a€ R. 


The set of all automorphisms of K over F has a special notation: 
Notation 16.3. Let F and K be fields with F < K. Then 


Aut(K/F) := {o € Aut(K) : Va e€ F, o(a) = a}, (16.2) 


the automorphism group of K over F (see Exercise 16.2). 


Remark 16.4. The notation K/F here does not denote any sort of quotient. 
When we say that an automorphism is “over F',” we are using the term “over 
F” as an adjective describing a property of the automorphism, not performing 
division. Here, the sense of “over F’” is that F is our ground field, which we 
consider to be inviolate, and not subject to the whims of an automorphism; 
F is also called the fixed field, for evident reasons. 

Our primary case of interest is when the extension is finite: that is, when 
[i : F] < oo. Our strategy will be to start at the bottom, with the identity 
automorphism ide : F — F. Since [K : F] < co, we can construct K by 
adjoining a finite number of elements of K to F; that is, we can write 


K = Flay,..., ap] (16.3) 
for some aj,..., a, € K (by Exercise 15.7). We successively ask, for increasing 
values of i, what are the embeddings of F[a1,...,a;] into K over F’? When 


we reach 7 = r, we will know all the embeddings of kK into K over F’, which 
are simply the automorphisms of K over F' (according to Exercise 16.22). 

At a typical intermediate stage, we will know all the embeddings of L := 
Flay,...,a;] into K over F, and we will want to understand the embeddings 
of Lia;j41] into K. Now, an embedding o : L @ K has an image M = o(L) 
with the property that 0 : L— M is an isomorphism. Writing a for aj41, 
we note that a is algebraic over L by Exercise 15.6. Let f = Irr(a, L,x). An 
extension of ¢ to an embedding rt : Lia] ~ K is completely determined by 
what it does to a. Since a is a root of the polynomial f € L[z], then we might 
expect that T(a@) is a root of a(f) € M[az]. The following result confirms our 
suspicions in a quite satisfying way. 


Theorem 16.5 (Extending Field Isomorphisms). Suppose that L and M are 
subfields of the fields L and M, respectively. Suppose thatao : L— M is 
an isomorphism. Let a € L and suppose that a is algebraic over L. Let & 
L|r] > M[a] be as in Lemma 16.1. Set f = Irr(a, L,x) and g = a(f) € M[z]. 
Then g is irreducible in M|x], and the extensions of o to an embedding of Lia] 
into M correspond bijectively with the roots of g in M. Namely, for every root 
B of g in M, there is a unique extension of o to an isomorphism from Lla] to 
M[6] which maps a to B; and every extension of o to an embedding of Lia] 
into M is of this form. 


Field Embeddings 171 


Proof. By Lemma 16.1, we can extend o to an isomorphism @ : L[x] > M[a] 
sending « to itself. Set g = a(f). Let @ € M be a root of g. The idea of the 
proof is to show that the unique candidate for the desired map t : Lia] > 
M{6], namely 


S- cjat s a(c;) 6, (16.4) 
j=0 j=0 


is a well-defined isomorphism. To accomplish this, it is convenient to first 
step back to the level of polynomial rings. Now let ~ : M[z] + M[6] be 
the evaluation-at-3 map. Set 7 = wo. It is not hard to verify (Exercise 
16.3) that Irr(8,M,x) = g. Therefore, we have ker(7) = (g). It follows that 
ker(7) = o~1((g)) = (f). By the Fundamental Theorem of Ring Homomor- 
phisms (Theorem 11.36), 7 induces an isomorphism L[z]/(f) = 7(L[a]) = 
M{6]. But L[x]/(f) = Lia] via the map sending x to a (see Proposition 15.4). 
It follows that there is an isomorphism 7 : L[a] + M[S] which extends o and 
which sends a to 6. The map 7 must be given by Formula 16.4, in order to be 
a ring homomorphism with the properties just stated. This shows that every 
root 3 of g in M gives a distinct embedding of Lia] into M which extends o. 

To complete the proof, let a ,, ..., a, be the distinct roots of g in M. 
Suppose that t : Lila] @ M is an embedding which extends o; that is, 
t|_, =o. Let 8 = Tr(a) € M. Write f = ea b;x? with b; € L. Then g = a(f) 
= yo4_, o(by)a3. Thus, we have 7(f(a)) = 7 borer bja) = ot 9 r(by)r (08) 
= Vico 1b)(T(@)) = Dj-9 7(b;)87 = Tj-0 (bj) = g(8). But also, we 
have f(a) = 0 since f = Irr(a, L,x), and so 7(f(a)) = 7(0) = 0. It follows 


that g(G) = 0. Thus 8 = a; for some i, and 7 is the map given by Formula 
16.4. 


To get some idea of what Theorem 16.5 is saying, let us consider the special 
case when L = M and o = idz is the identity map. So let L be any field, let 
f € La] be irreducible in L[z], and let a, 6 be roots of f in some extension 
field L of L. Then Theorem 16.5 asserts the existence of an isomorphism 
7 : Lia] > L[G] making the following diagram commute (the vertical maps 
are the natural embeddings): 


Lia] —— LB] 


] ] (16.5) 


bee 
The big idea here is that any two roots a and £ of the irreducible polyno- 
mial f “look alike over L”: there is an isomorphism 7 : L[a] > L[G] which 
fixes L pointwise and sends a to 6. 


Example 16.6. The complex numbers 7 and —i “look alike” over the real field 
R. Both quantities are roots of the polynomial x? + 1, which is irreducible in 
R{z]. Consequently, there is an isomorphism 7 : R|[?] > R[—i] which fixes 


172 Galois Theory 


R pointwise and sends i to —i. Now R[?] = R[-7] = C, and the map T is just 
complex conjugation. 


Motivated by Example 16.6, we make the following definition. 


Definition 16.7. Let F be a subfield of K and let a, G © K. We say that a 
and 6 are conjugate over F if there exists an isomorphism tT : Fa] > FG] 
such that 7 is the identity map on F' and r(a) = f. 


Remark 16.8. The definition of conjugate above is distinct from the definition 
of conjugate in group theory, Definition 7.15. It should be clear from con- 
text which definition is meant. In general, a “conjugate” is something closely 
related to—conjoined or coupled with—some original object. In both groups 
and fields, we note that a conjugate is the image of the original object under 
an isomorphism. 


Example 16.9. The numbers V2 and —/2 are conjugate over Q. For both 
numbers are roots of the polynomial x? — 2, which is irreducible in Q[z]. 
Compare this discussion to the results we obtained earlier about the number 
game on Z[V2], or, equivalently, about the group Aut(Z[vV2]). 


Next we utilize Theorem 16.5 to prove a lemma which will be used later 
in this chapter. 


Lemma 16.10. Suppose thato : L © K is an embedding of fields, and that 
L is a finite extension field of L. Then there exists an extension field K of K 
and an embedding rT : L< K such that r extends o. 
Proof. We proceed by induction on [L : L]. 

Base Case: [L : L] = 1. Then by Exercise 15.5, we have L = L, so we 
may take K = K and t =o. 

Inductive Step: Suppose Re : L] > 1. Then da € L—L. By Exercise 15.6, 
a is algebraic over L. Set M = o(L), so that o is an isomorphism from L to M. 
Let o : L[x] + M[z] be the isomorphism of Lemma 16.1. Set f = Irr(a, L, x) 
and g = a(f) € M[a] C K[a]. Note that g ¢ K, so by Corollary 15.12, 
there exists an extension field K’ of K such that g has a root 6 in K’. By 
Theorem 16.5, there is an isomorphism o’ : Lia] + M[6] which extends o. 
Now set L’ = L[a], and (for consistency of notation) L/ = L. Then we have 
[L’ : L’]=[L : Lol] =[£ : L] / [La] : L]) <[£ : L). Thus, by inductive 
hypothesis applied to the primed entities, there exists an extension field 
of K’ and an embedding t : L’ G K such that 7 extends o’. Finally, K is 
an extension field of K, 7 is an embedding of L into K, and 7 extends o, as 
required. 


Separable Extensions 173 


16.2 Separable Extensions 


In order to understand Theorem 16.5 better, if we are given an irreducible 
polynomial g € M{a], then we would like to know how many roots g has in a 
given extension field M of M. 

The number of roots of g in M is a function of both M and g. First, M 
may simply not contain some of the roots of g; this problem can be overcome 
by extending M until g factors completely, and we will consider this aspect of 
the situation in the following section. 

Second, we would like to know how many distinct roots g has when we do 
manage to factor g completely. For this quantity provides a fundamental limit 
on the number of roots of g in any field. Most of the time, we expect that a 
polynomial of degree n should have n distinct roots, but of course this does 
not always happen. 

For example, let f(x) = (x —2)?-(#+3)-(a—9) € R[z]. Then deg(f) = 4, 
but f has only 3 distinct roots, since the number 2 is a double root of f. Ideas 
from calculus turn out to be useful in identifying this type of situation. Taking 
the derivative using the product rule, we find 


f(z) = 2(a@—-2)-(2+3)-(2—9)+(a—2)?-1-(2—9)+(x—2)?-(2+3)-1. (16.6) 


Notice that «—2 is a factor of f’ as well as of f, so we have f(2) = f’(2) =0. 

In order to apply this idea more generally, we want to define the notion of 
derivative for polynomials over arbitrary commutative rings with 1. But we 
cannot use the standard definition of a derivative as a limit, since we have no 
notion of “limit” in general rings. Instead, we simply define the derivative of 
a polynomial by using the usual formula from calculus. 


Definition 16.11. Let R be a commutative ring with 1, and let f € Riz]. 
Write f = Soyo aja! with a; € R. The derivative of f with respect to x is the 
polynomial 


= so -aj;xI~* € R[z]. (16.7) 
j=l 


Remark 16.12. In Equation 16.7, the notation j-a; stands for aj;+aj+---+a; (7 
terms). Recall that we use this multiplicative notation in place of exponential 
notation in a commutative group whose operation is denoted + (addition). 

Using the above definition of derivative, the product rule for derivatives of 
polynomials is still true in our general setting: 


Lemma 16.13 (Product Rule). Let R be a commutative ring with 1, and let 
f,g € Ria]. Then we have (fg) = f'g+ fa’. 
Proof. Write f = ae, nats GS eee bjv7 with a;,b; € R. Then we 


have f’ = Se a;07 and g/ = pauiars b;a7, where @; = (j + 1)aj41 and 


174 Galois Theory 


A 


6; = (j + 1)bj41. By Definition 14.1, we have fg = 3%)’ cea*, where 
Ck = ea ajby—;. So (fg) = Sh" kega*-!, and the coefficient of a* in 


(fg)! is (k + Lens = Shi (k + Lajbey1—j- Similarly, the coefficient of x” 
in f’g is Wyo Gsba_y = Djao(F + Vajsrde_y = LPL Jejbe_j41, and the 
coefficient of «* in fg’ is <4 ajdh-j = sear aj(k — 7 + 1)bp_j41. Thus, 
the coefficient of x* in f’g + fa’ is Ret + l)ajbe4i1-7;, which matches the 
coefficient of «* in (fg)’. 


Definition 16.14. Let F be a subfield of K, and let a € K. Let f € Fla]. 
We say that a is a repeated root or multiple root of f if (a — a)? divides f in 
K{a]. The multiplicity of a as a root of f is the greatest integer m such that 
(a — a)™ divides f in K[z]. 


Proposition 16.15. Let F be a subfield of K, and leta € K. Let f € Fiz]. 
Then 

(i) a is a repeated root of f iff f(a) = f(a) = 0. 

(ii) If f ts irreducible in F[a] and a is a repeated root of f, then f'(x) = 0. 


Proof. (i) (=): Suppose that a is a repeated root of f. This means that f = 
(x—a)?-g for some g € K[z]. So we have f(a) = (a—a)?-g(a) = 0. Moreover, 
using the Product Rule, we have f’(a) = ((# — a)?)!- g(x) + (x — a)? - g'(z) 
= 2(x — a)g(x) + (w@ — a)? - g’(x), and so f’(a) =0 also. 

(<): Suppose that f(a) = f’(a@) = 0. Then by Theorem 14.24, we can say 
that « — a divides f in K[a]. So f = (a—a)-h for some h € K[a]. Taking 
derivatives, we find f’ = 1-h+(a—a)-h’. Therefore, 0 = f’(a) = h(a). 
By Theorem 14.24 again, we have h = (a — a)- g for some g € K [a]. Thus 
f =(x-—a)?-g, so a is a repeated root of f. 

(ii) Suppose that f is irreducible in F'[z] and a is a repeated root of f. 
Then by part (i), we have f(a) = f’(a) = 0. Consider the evaluation-at-a 
map €q : Fla] > K. By Proposition 15.4 and Definition 15.5, we know that 
ker(€a) = (g) where g = Irr(a, Fx) is the irreducible polynomial of a over 
F. Now both f and f’ are in ker(€q), so we have g | f and g | f’ in Fla]. 
Therefore, f = g-h for some polynomial h € F[x]. Since f and g are irreducible 
in F[z], we must have h € (F[a])* = F*. Thus deg(f) = deg(g). But since 
g| f’, we can write f’ = g-q for some q € F[a]. Now deg(f’) < deg(f), while 
if g # 0 then deg(g - q) = deg(g) + deg(q) = deg(f) + deg(q) = deg(f), and 
thus deg(g-q) > deg(f’), a contradiction. It follows that g = 0, and so f’ = 0, 
as desired. 


Example 16.16. Proposition 16.15 (ii) leads us to ask, when does f’(x) = 0 
in Fla]? Certainly if f € F, that is, if f is constant, then f’ = 0. In classical 
calculus—if F < C—this is the only way to get f’(x) = 0. 

But consider f(a) = 2? — 1 € F,[z], where F, = Z/pZ and p is a prime 
of Z. Here we have f’(x) = px?~! —0 = 0, yet f ¢ Fy. For example, when 


Separable Extensions 175 


p = 2, we have f(x) = 2? — 1 = (x — 1)?. More generally, it turns out that 
x? —1=(a—1)? in F,[2] (see Corollary 21.21). 

We note that none of the polynomials f given in this example is actu- 
ally irreducible. To find an example of an irreducible polynomial f such that 
f'(x) = 0, we must look further afield—to so-called function fields; see Exer- 
cise 22.14. 


To capture the condition that a given field extension is well-behaved 
with respect to the number of roots of its associated irreducible polynomi- 
als, namely, that the number of roots is equal to the degree, we make the 
following definition. 


Definition 16.17. Let K be an algebraic extension field of F. For an indi- 
vidual element a € K, we say that a is separable over F if Irr(a, F,x) has 
no repeated roots. We say that K is separable over F if for alla € K, a is 
separable over F'’. 


Remark 16.18. In Definition 16.17, we did not specify where to look for re- 
peated roots. It turns out that this does not matter. By Exercise 16.6, if an 
irreducible polynomial f = Irr(a, F,x) has a repeated root in any field what- 
soever, then all roots of f are repeated roots, including a itself. This accords 
with our principle that all roots of an irreducible polynomial look the same 
over the ground field. 


Example 16.19. Suppose that F and K are fields between Q and C, so that 
Q<F< K < GC, and that K is algebraic over F. Let a € K, and let 
f = Irr(a, F,x). Since f is irreducible in F'[x], we must have f ¢ F, because 
the definition of irreducible excludes 0 and units. Now since F < C, we have 
f € C[z] — C, so f’(a) 4 0. Thus by Proposition 16.15, f has no repeated 
roots. We conclude that K is separable over F’. 

Looking at Example 16.16, the reader may notice that what allowed ff’ 
to be 0 without f being constant was that, in F,,[z], multiplying something 
non-zero by a non-zero integer could turn out to give zero. This does not 
contradict the fact that F,[z] is a domain, as guaranteed by Theorem 14.10; 
for the ring of ordinary integers Z is not a subring of F,[z]. For convenience, 
set R = F,|z]. With a little creativity, we realize that although Z < R, there 
is a natural-seeming map from Z to R which sends 1z to 1z and in general 
sends n to n- 1p. What went wrong in allowing f’ to be 0 without f being 
constant was that the kernel of this map from Z to R was non-trivial. This 
observation generalizes as follows. 


Lemma 16.20. Let R be a commutative ring with 1. Then the natural map 
x : Zo R given by the formulant n- 1p ts a ring homomorphism. 


Proof. Exercise 16.16. 


Definition 16.21. Let R be a commutative ring with 1. Let y : Z— R, 
nt>n-I1p be the natural map. The characteristic of R is the unique non- 
negative integer c such that ker(y) = (c). We write char(R) = c. 


176 Galois Theory 


Example 16.22. In the case of an ordinary number system R, the natural map 
from Z to R is an embedding, corresponding to the subring relation Z < R. 
Consequently, the ordinary number systems, such as Q, R, and C, have char- 
acteristic 0. On the other hand, the “modular” ring Z/nZ has characteristic 
n; the term natural map is actually defined two different ways in this case, but 
they coincide to give the same mapping from Z to Z/nZ, which has kernel 
nZ. 


Finally we can give a relatively simple condition on a field which is sufficient 
(but not necessary!) to guarantee separability. 


Proposition 16.23. If F is a field of characteristic 0, then every algebraic 
extension field of F is separable over F’. 


Proof. Suppose that char(£’) = 0 and that K is an algebraic extension field of 
F. Let a € K, and set f = Irr(a, F,x) € F [x]. By the definition of irreducible 
polynomial, f is monic and non-constant, so the leading term of f is 1p + x” 
where n = deg(f) > 1. Thus, the term of f’ of degree n—1isn-1p-a"7!. 
Because char(f’) = 0, we have n- lp 4 Op, and thus f’ # 0 in Fla]. By 
Proposition 16.15, f cannot have repeated roots. 


16.3. Normal Extensions 


We turn again to Theorem 16.5 and ask, how can we guarantee the maximum 
possible number of embeddings of L{a] into M? In the previous section, we 
defined the notion of separability, which ensures that the number of roots 
of an irreducible polynomial is equal to its degree. A non-zero polynomial 
over a field can never have more roots than its degree, by Theorem 14.25. 
The separability of the field extension M /M thus guarantees the maximum 
possible number of embeddings of L[a] into M—provided in addition that 
M contains all the roots of our irreducible polynomial g. This motivates the 
following definition. 


Definition 16.24. Let K be an algebraic extension field of F’. We say that 
K is normal over F if for all a in K, the polynomial Irr(a, F,«) factors 
completely over K. 


Let us imagine how we might construct a normal extension of a given field 
F. We could start with a monic polynomial f which is irreducible in F[z], 
and then form a splitting field kK for f over F’. At this point, we are at least 
assured that f factors completely over K. So if a is a root of f in K, then 
Irr(a, F, x) (which is f) factors completely over K. But what of the irreducible 
polynomials of the other elements of kK? In general, K could be infinite, while 
f has only finitely many roots. Must we continue to extend Kk by adjoining 


Normal Extensions 177 


the remaining roots of the irreducible polynomials of all these other elements? 
Fortunately, the answer is no. 


Proposition 16.25. Let K be a finite extension field of F. Then the following 
conditions are equivalent: 

(1) K is a splitting field over F for some non-zero polynomial f € F\z]. 

(2) For all extension fields K of K, and all embeddings a : K — K over 
F, we have o(K) CK. 

(3) K is normal over F. 


Proof. [(1)=(2)] Suppose that K is a splitting field for f € F[a] — (0). Then 
we can write f(x) = eT ,(@—p;) for some c € F, p; € K; and by Theorem 
15.21 we have K = F[pi,..., Pn]. Let K be an extension field of K, and 
suppose that 0 : K © K is an embedding of K into K over F. We note 
that f need not be irreducible in F[a], but that for each 7, we must have 
Irr(p;, F, x) divides f in F'[a], since f(p;) = 0. By considering the restriction 
of o to F[p,] and applying Theorem 16.5, we see that for each j, o(p;) must 
be a root of f; because K is a domain, this forces o(p;) to be one of the p;’s. 
Since K = F[p1,..., Pn], and o acts as the identity map on F’, it follows that 
o(K) C K, as required. 

[((2)=(3)] Suppose that (2) holds. Let a € kK. By Exercise 15.6, we know 
that a is algebraic over F; set f = Irr(a, F,x). Then we have f € Fa] C K [a]. 
By Exercise 15.10, there is a finite extension field K’ of K such that f factors 
completely over K’. So we can write f =c- Guat —a,) for some c € F and 
a; € K'. Now fix a value j such that 1 < 7 < n. By Theorem 16.5, there is 
an isomorphism 0 : Fla] > F[a,;| which extends the identity map on F and 
sends a to a;. Using Lemma 16.10 with L := F[al, L:= K,and K := K’, we 
can find an extension field K of K’ and an embedding tr : K © K such that 
7 extends o. By (2), we must have r(K) C K, hence in particular (a) € K. 
Since 7 extends a, this says a; € K. Since j was arbitrary, we can say that f 
factors completely over kK. Thus K is normal over F’. 

[((3)=(1)] Suppose that K is normal over F’. Since [K : F'] < co, we can 
write K = F[ay,...,Qn] for some elements a1, ...,@n € K (by Exercise 15.7). 
Set f; = Irr(a;, F,x) for each i between 1 and n, and set f = fi: fo--+ fa € 
F [a]. By definition of normal, each f; factors completely over K, say as 


Di 
erat £L — fiz) (16.8) 


with c; € F and p;,; € K. It follows that f also factors completely over K. 
By Theorem 15.21, the field L := F[{p;,;}] C K is the unique splitting field 
for f in K. But since f; = Irr(a;, F,x), we have f;(a;) = 0 for each i, and 
thus by Equation 16.8 we must have a; = p;,; for some 7. It follows that 
Flai,...,@n] C L. Hence L = K, so K is a splitting field for f over F. 


Example 16.26. Let F = Q[V2], K = Q[W/2], and L = Q[V2,i]. Then F 


178 Galois Theory 


is normal over Q, since F' is the splitting field over Q of the polynomial 
x? — 2 € Q{z], whose roots are 2 and —V2. Also, K is the splitting field 
over F' of the polynomial g := x? — V2 € Fla], since K = F[wW2] and the 
roots of g are W/2 and — V2. Therefore, K is normal over F’. But K is not 
normal over Q. To see this, we observe that Irr(/2,Q,x) = a4 — 2 =: f by 
Example 15.27; but f does not factor completely over K, since (by Exercise 
15.12) L is a splitting field for f over Q, and Kk C L. A factorization of f into 
irreducibles in K[a] is a4 — 2 = (a? + V2)(x — W2)(a+ W2). In fact, L is the 
smallest extension field of kK in C which is normal over Q. 


16.4. Galois Extensions 


When we combine the two conditions defined in the previous two sections, 
namely, separability and normality, then we get the “best” kind of field ex- 
tensions: that is, we get field extensions with the greatest number of auto- 
morphisms, in a precise sense (see Exercise 16.14). Many other benefits also 
follow. 


Definition 16.27. Let K be an extension field of F’. We say that K is Galois 
over F' if K is both normal and separable over F. 


Notation 16.28. If K is a finite Galois extension of F’, then we denote 
Aut(K/F) by Gal(K/F), and call this the Galois group of K over F. 


One nice feature of Galois extensions is that the top field in a finite Galois 
extension is Galois over any intermediate field, all the way down to the bottom 
field in the extension. This is the content of the following lemma. 


Lemma 16.29. Let K be a finite Galois extension of F. Let L be a field 
such that F < L < K. Then K is Galois over L, and we have Gal(K/L) < 
Gal(K/F). 


Proof. [Show: K is normal and separable over L] Let a € K. Since K is 
algebraic over F (e.g., by definition of normal), we can set f = Irr(a, F, 2). 
Now f € Fla] C Lia] and f 4 0, but f(a) = 0. Thus, a is algebraic over L, 
and we set g = Irr(a, L, x). Since f € L[z] and f(a) = 0, we have g | f in L[a]. 
Because K is normal over F’, we can factor f completely in K[z]. It follows 
(by Exercise 16.20) that g also factors completely in K[z], using a subset of 
the linear factors of f; thus, K is normal over L. Since K is separable over F, 
f has no repeated roots in K, and so g also has no repeated roots. Thus, K 
is Galois over L. If o € Gal(K/L), then we have o € Aut(K), and o(a) = a 
for all a € L; hence o(a) = a for alla € F, so o € Gal(K/F). 


We are almost ready for the main result of this chapter, the Fundamental 
Theorem of Galois Theory. The setting is a finite Galois extension K/F'’, with 


Galois Extensions 179 


Galois group G. The result tells us that the intermediate fields between F’ and 
K exactly correspond to the subgroups of the finite group G, even so far as 
to say that normal extensions of F in K correspond to (has the reader dared 
to guess that such an amazing thing could be true?) normal subgroups of G. 

The correspondence between intermediate fields and subgroups requires a 
little explanation. Given an intermediate field D with F < L < K, we can 
form the group Gal(AK/L), using Lemma 16.29. From the other side, given a 
subgroup H < G, we can form the set of all elements of K which are fixed by 
every element of H: 


Definition 16.30. Let K/F be a finite Galois extension, and let H < 
Gal(K/F). The fixed field of H is the set 


K® :={aeK : Vo €H, o(a) =a. (16.9) 


Lemma 16.31. Let K/F be a finite Galois extension, and let H < Gal(K/F). 
Then K™ is a field, and F < Ki < kK. 


Proof. Exercise 16.21. 


One final remark is in order before the main theorem. We say that the 
correspondence just described is order-reversing. This is because if H < M < 
G, then K” D> K™; and if F< L < M < K, then Gal(K/L) D Gal(K/M). 
These assertions are straightforward to check, and should be verified by the 
reader. 


Theorem 16.32 (Fundamental Theorem of Galois Theory). Let K/F be a 
finite Galois extension. Let G = Gal(K/F). Then: 

(1) IGl=([K : Fl. 

(2) Define a function y from the set of all fields between F and K to the 
set of all subgroups of G by 


y:{L:F<L<K)>{H: H<G}, 
¥(L) = Gal(K/L). 


Then ¥y is a bijection, and for H < G, we have y~'(H) = K*. 
(3) For any subgroup H of G, we have H <1 G iff K” is normal over F, 
in which case K" is Galois over F and we have Gal(K"/F) & G/H. 


Proof. (1) We will prove that for every intermediate field LD with F< L< K, 
the number of embeddings of L into K over F is exactly [L : F]. We proceed 
by induction on k := [L : F]. When k = 1, then L = F (by Exercise 
15.5), and the unique embedding of L into K over F is the identity map on 
F. Inductively, suppose that F < L < K and that our hypothesis holds for 
every field L with [L : F] <k=[L : F]. We can realize L as the top 
field in a tower of simple extensions starting at F', by Exercise 15.7. So in 
particular, we can write L = Lia] for some field L and some a € K, where 


180 Galois Theory 


F<L<L<K. By Lemma 16.29, K is Galois over L; hence a is algebraic 
over L, and we set f = Irr(a, L,x). Let d = deg(f). By Lemma 15.25, we 
have [L : L] = d. Suppose that o is an embedding of L into K over F. 
Set M = o(L). Then M is a field, F < M< K,ando : LM isan 
isomorphism. Set g = o(f) € M[z], where o is the isomorphism of Lemma 
16.1. Then deg(g) = deg(f) =: d. By Lemma 16.10, we can extend o to an 
embedding T : K > K for some extension field K of K; and by Proposition 
16.25, we can take K = K. By Exercise 16.22, 7 is an automorphism of K; 
since rT extends o, then rt € Aut(K/F). Let 8 = r(a) € K. Then r(q) is 
a root of g by Exercise 16.4, so g = Irr(3, M,x) by Exercise 16.3. Now K is 
Galois over M (by Lemma 16.29), so g factors completely over K with distinct 
roots. So by Theorem 16.5, the number of extensions of o to an embedding 
of L into K is equal to the number of roots of g in K, which is equal to the 
degree d of g (note how both normality and separability were used here). Since 
our inductive hypothesis holds for L, the number of distinct embeddings of 
L into K over F is equal to k := [L : F]. Therefore, the total number of 
embeddings of L into K over F is equal to d-k. But by Lemma 15.26, we also 
have [L : F)=[L :L]-[L : F] =d-k. This completes the inductive step. 
To complete the proof of (1), we note that an embedding of K into K over F 
is the same thing as an automorphism of Kk over F’, by Exercise 16.22. 

(2) We first show that ¥ is injective. So suppose that L, and Lz are fields 
between F and K such that 7(L1) = y(L2); that is, Gal(K/L,) = Gal(K/L2). 
Assume for a contradiction that L; # L2. Then without loss of generality, we 
may suppose da € L, — Lg. Set L = Le[a]. Then we have F < Ig < L< K. 
Now we have Gal(A/L) C Gal(K/L2); and since [K : L2] > [kK : L], then 
part (1) tells us that in fact Gal( K/L) C Gal(K/L2). But let o € Gal(K/L2). 
Then we have o € Gal(K/L1) as well, and so o(a@) = a. It follows that o must 
fix every element of L2/a] = L, and so o € Gal(K/L). We have deduced that 
Gal(K/L2) C Gal(K/L), a contradiction. Thus, ¥ is injective. 

At this point, we note that the injectivity of y implies that there are only 
finitely many fields LD between F and K; this is because the finite group G can 
have only finitely many subgroups. From Exercises 15.13 and 15.14, it now 
follows that K is a simple extension of F; that is, Ja € K such that K = Fal. 

To complete the proof of (2), we will show that for every H < G, we 
have Gal(K/K") = H. Note that for each o € H, o acts as the identity 
map on K”, by definition of K”; and thus, o € Gal(K/K”). Therefore, 
we have H < Gal(K/K"). We will now apply the Kaleidoscope Principle 
(see Exercise 5.9) to construct what “should” be the irreducible polynomial 
of a over K”, Namely, set f = [],¢q(a — o(a)) € K[a]. Let r € H; then 
(identifying 7 with its extension 7 : K[xz] > K[a] of Lemma 16.1) we have 
r(f) =7 (Teen — 9(4))) = Teen 7@ — o(@)) = Ten (t(2) - 7(0(a))) 
= [eu (e—(T2)(a)). As o varies over the elements of H, the quantity To also 
varies over the elements of H (although these elements are permuted according 
to the map 7, of Cayley’s Theorem, Theorem 10.9). This observation lets us 
conclude that 7(f) = f (Wow!). Since + was an arbitrary element of H, we 


Galois Extensions 181 


have f € K"[z], by definition of K”. Let g = Irr(a, K",x) € K# [2]. Since 
eq € H, then a is a root of f in K. It follows that g divides f in K™ [x]. Hence 
(using Lemma 15.25) we have [K"[a] : K"] = deg(g) < deg(f) = |H]. 
But since Fla] = K and F < K" < K, we have K"[a] = K. Therefore, 
[Al se [hs R= |Gal(K/K*) rE by part (1). Since H < Gal(K/K*), 
this forces H = Gal(K/K"), as desired. We note incidentally that this gives 
deg(g) = |H| = deg(f), so in fact we do have g = f. 

(3) (=) Suppose that HG. Let 8 € K". Then 6 € K, so @ is algebraic 
over F’, and we can set gq = Irr(@, F,x). Since K is normal over F’, we know 
that q factors completely over K; thus, to show that K* is normal over F, 
it is enough to show that any root of q in K actually lies in K”. So let 8 be 
a root of q in K. Let h = [],¢¢(x — o(8)). Note that y(K°) = Gal(K/K®) 
= G = Gal(K/F), so K° = F by part (2). Furthermore, we have t(h) = h for 
any T € G (as in the proof of (2)), and so h € K@[z] = F[z]. Since idx € G, 
then § is a root of h, so h(8) = 0. It follows that q divides h in Fz]. Since 
B is a root of q, then B must also be a root of h, and so B = o(8) for some 
o €G. [Show 6 € K®] Let + € H. [Show 7(8) = 6] Then 7(8) = 7(0(8)) 
= (ra)(G). Since H <I G, we have Ho = oH (by Lemma 7.25), so we can 
write To = o7 for some 7 € H. Therefore, (ra)(8) = (07)(8) = o(F(8)) 
= 0() (siviee B € K” and7 € H) = 8. We have shown that 7(8) = 6 for 
any T € H, and thus 6 € K”. But @ was an arbitrary root of q in K. Thus, 
q factors completely over K#, and so K is indeed normal over F. 

(<) Suppose that K” is normal over F. Let o € G and r € H. [Show 
ota! € H] Set y =o7ro 1 € G. [Strategy: Show that 7 fixes every element 
of K"; part (2) will then show that w € H] Let a € K”. Since K” is 
normal over F, then o~!(K") C K” by Proposition 16.25 (2). Therefore, 
a~(a) € K™. Since t € H, we can say T(o~1(a)) = o~1(a). Thus, we have 
w(a) = (oro ")(a) = o(o~*(a)) = a. Since a was an arbitrary element of 
K", we have shown that 7 € Gal(K/K"”). By part (2), this gives w € H, 
which was our goal. 

We have established the “iff” part of (3). Now we suppose that K¥ is 
normal over F’, and first prove that K is Galois over F. Let a € K”. 
Then a € K, so Irr(a, F,x) has no repeated roots (since K is separable over 
F). This shows that K” is separable over F', and hence Galois over F’. Set 
Q = Gal(K#/F). Define a function w : G— Q by a 4 algun. We are 
guaranteed by Proposition 16.25 (2) that this restriction really does map kK” 
into K™, and hence defines an element of Gal(K/F) by Exercise 16.22. It 
is immediate that w is a group homomorphism. To show that w is surjective, 
let 7 € Q. By Lemma 16.10, there is an extension of 7 to an embedding 
7 : K — K, for some extension field K of K. But by Proposition 16.25 (2), 
we must have 7(K) C K, so we have 7 € G (by Exercise 16.22) and w(7) = 7; 
thus, w is surjective. We have ker(w) = {0 € G : olga =idgr}. Certainly, 
if o € H, then o restricts to the identity map on K”, by definition of K”; 
thus, H < ker(w). By the Fundamental Theorem of Group Homomorphisms 
(Theorem 9.7), we have Q = G/ker(w). Since all groups in question are finite, 


182 Galois Theory 


we have |Q| = |G/ker(w)| = |G|/|ker(w)|. But also, by parts (1) and (2), 
we have |Q| = [K¥ : F) =[K : FI/[K : K*] = |G|/|Gal(K/K*®)| 
= |G|/|H|. This forces |ker(w)| = |H|, and thus ker(w) = H, which gives 
Q = G/H, as desired. 


16.5 Exercises 


Exercise 16.1. Complete the proof of Lemma 16.1 by showing that 6 satisfies 
Equation 16.1 and that o is an isomorphism. 


Exercise 16.2. Let R < S be commutative rings with 1, and define 
Aut(S/R) := {a € Aut(S) : o(a) =a for alla€ R}. 


(a) Prove that Aut(S/R) < Aut(S). 

(b) Let T be the image of the natural map from Z to S, so T < S (see 
Definition 16.21). Prove that Aut(.$/T) = Aut(S). 

Exercise 16.3. Let o : L— M be an isomorphism of fields. Let ¢ : L[z] > 
M[a] be the extension of a given by Lemma 16.1. Let L be an extension field 
of L, let a € L be algebraic over L, and set f = Irr(a, L,x) and g = G(f). Let 
B be a root of g in some extension field M of M. Prove that Irr(8,M,«) = g. 
Note: This exercise is used in the proof of Theorem 16.5, so you should avoid 
using that theorem here! 

Ezercise 16.4. Let K bea field, and let 7 € Aut(K). Let a € K, and suppose 
that f € K[a] and f(a) =0. Let 6 = 7(a). Let g=7(f), where 7 : K[a] > 
K [a] is the isomorphism of Lemma 16.1. Prove that g(G) = 0. 

Exercise 16.5. Let ¢ : Q{W2] > Q[V2] be the embedding which sends V2 to 
—,/2. What are the extensions of o to an embedding tT : Q[W2] > Q[W2]? 
What is Aut(Q[W2]/Q)? 

Ezercise 16.6. Let F be a field and suppose that f € F[az] is monic and 
irreducible in F[z]. Let a, 6 be roots of f in some extension field K of F’. 

(a) Prove that f = Irr(a, F,x) = Irr(6, F, 2). 

(b) Prove that a is a repeated root of f iff 8 is a repeated root of f. 
Exercise 16.7. Let K/F be a finite Galois extension. Let a, 6 € K. Prove 
that 6 = o(a) for some o € Gal(K/F) iff Irr(a, F,x) = Irr(f, F,x). In the 
language of Definition 16.7, this says that two elements of K are conjugate 
over F iff they have the same irreducible polynomial over F’. 


Exercise 16.8. Prove that if K is separable over F’, and L is any field such 
that F< LD < K, then 

(a) K is separable over L, and 

(b) L is separable over F’. 

(c) Conclude that for any two fields ZL; and Lz such that F < Ly < Lg < 
K, we have that D2 is separable over Ly. 


Exercises 183 


Exercise 16.9. Let F be a field and let f € Fla] — (0). Prove that any two 
splitting fields for f over F' are isomorphic over F’. That is, prove that if kK and 
L are splitting fields for f over F’, then there is an isomorphism 0 : K > L 
such that o is the identity map on F. 


Exercise 16.10. Let F< K be fields with [kK : F] =n < oo. Prove that for 
every embedding o of F into a field LD, there are at most n extensions of o to 
an embedding of Kk into L. 


Exercise 16.11. Let K be a finite extension field of F’. Prove that the following 
conditions are equivalent: 

(1) K = Flay,...,a,] for some aj, ..., ar € K where each a; is separable 
over F’. 

(2) For every embedding o of F into a field M, there exists a superfield L 
of M such that the number of extensions of o to an embedding 7 : K @ L 
is equal to [K : F). 

(3) There exists an extension field L of K such that the number of distinct 
embeddings of K into DL over F is equal to [K : F). 

(4) K is separable over F’. 


Exercise 16.12. Suppose that K is a finite extension field of F’. 

(a) Prove that there is a finite extension field L of K with the properties 
that (1) D is normal over F' and (2) no smaller extension field of K is normal 
over F’. (Note: L is called a normal closure of K over F.) 

(b) Prove that if K is separable over F,, then so is L. 

(c) Prove that LE is unique up to isomorphism over K: that is, if M is 
another normal closure of K over F’, then there is an isomorphism from M to 
L over Kk. 


Exercise 16.13. Let K be a finite extension field of Ff’. Prove that K is Galois 
over F if and only if K is a splitting field over F' for some non-constant 
polynomial f € F[z] such that f has no repeated roots. 


Exercise 16.14. Let K be a finite extension field of F. Prove that 
|Aut(K/F)| < [kK : F], with equality iff K is Galois over F’. 
Exercise 16.15. Decide, with proof, whether each of the following field exten- 
sions is (1) separable, and (2) normal. 

(a) Q[vV/10] over Q 

(b) Q[v2] over Q 

(c) Fo[a]/(2? + x + 1) over Fy, where Fy = Z/2Z = {0,1}. 
Exercise 16.16. Prove Lemma 16.20. 


Exercise 16.17. Prove that the characteristic of a field must be 0 or a prime 
number. Hint: Exercise 11.11 may be a useful ingredient in the proof. 


Exercise 16.18. Prove that if K is an extension field of F’, then char(K) = 
char(F). 


184 Galois Theory 


Exercise 16.19. [Binomial Theorem] Let R be a commutative ring with 1, let 
a,b € R, and let n € N. Prove by induction on n that we have 


(a+5)"=>> (;) -gkyr—k (16.10) 
k=0 
where, as usual, (7) denotes the binomial coefficient es) = CEE In par- 


ticular, your proof should show that (7) € N. 


Exercise 16.20. Let F be a field, and let f € F[a]— (0). Suppose that K is an 
extension field of F' such that f factors completely over K. Also suppose that 
g € Fla] and g divides f in F[z]. Prove that g factors completely in K [a], and 
that if a is a root of g in K with multiplicity m, then a is a root of f with 
multiplicity M > m. 

Exercise 16.21. Prove Lemma 16.31. 


Exercise 16.22. Let K be a finite extension field of F’, let L be an intermediate 
field, and let 0 : LK be an embedding over F’.. 

(a) Prove that o is a linear transformation of F-vector spaces. 

(b) Prove that dimp(o(L)) = dimp(L). (Exercise 13.13 may be of use 

here.) Conclude that if L = K, then o is an automorphism of K. 
Exercise 16.23. Let K/F be a finite Galois extension, and let G = Gal(K/F). 
For S C G, define KS := {a€ K : Vo € S,0(a) = a}. Prove that KS = K‘*), 
Exercise 16.24. (The reader may find Exercise 15.14 relevant.) Let K be a 
finite field. 

(a) Prove that char(A’) = p for some prime p, and hence that Z/pZ 3 K 
via the natural map from Z to K; let F' be the image of this map, so that 
Z/pZ =F < K. (F is called the prime subfield of K.) 

(b) Let r= [Kk : F). Prove that K is a splitting field for the polynomial 
a?" — a over F. 

(c) Prove that K is Galois over F’. 


Exercise 16.25. Let K be the splitting field for x+—2 over Q in C (see Example 
15.27). 

(a) Why is K Galois over Q? 

(b) Why does G := Gal(K/Q) have order 8? 

(c) Let F = Q/V2] and L = Q/ V2], so that Q < F < L < K. Show that 
L is Galois over F' but not over Q. 

(d) Use the previous parts of this exercise together with the Fundamental 
Theorem of Galois Theory to show that there is a tower of groups Hy < H2 < 
G such that H, < Hz and Hz < G but H; ¢G. Thus, the property of being 
a normal subgroup is not transitive. 

(e) Find concise descriptions of all 8 elements of G. Hint: this is easy if 
you start by asking where \/2 can be mapped, and then asking where i can 
be mapped. 


Exercise 16.26. Let K be a finite extension field of Q. Show that the number 
game on KC is winnable iff Aut(I) is the trivial group. 


17 


Direct Sums and Direct Products 


17.1 Introduction 


In this chapter, we present two constructions for combining several objects 
from the same category into a single object of that category. These notions ap- 
ply to all of the categories we have studied—groups, rings, and vector spaces. 
We model our constructions on a construction we already know for combining 
two or more sets into a single set: the Cartesian product. 


= ee 


17.2 Direct Products 


We could start by defining a construction without giving any background, 
but we prefer to motivate it instead by setting it in context. So, in the spirit 
of abstract algebra, we start by asking: What properties does the Cartesian 
product have? 

Consider two sets A; and Ag, and let C = A, x Ao. First, notice that there 
are “natural” maps (that is, functions) from C' to A;. Namely, we can define 
the “projection” maps 7 : C — A; by m(a,y) = x and mo(a,y) = y for 
(x,y) EC. 

Now, the existence of maps to both A; and Ag is not enough to characterize 
the Cartesian product of A; and Ag; lots of other sets besides C' also have 
maps to both A,’s. But what makes C special is that it captures all the 
information in both A; and Ag while being minimal in a certain sense. We 
want to understand these properties in a purely functional way. 

So suppose that we have a set D and maps a; : D — Aj. We claim 
that we can interpose C’ between D and the A,’s. More precisely, there is a 
map f : D—C through which the maps a; factor. The reader is invited to 
construct f and verify the previous statement before reading on. 

The unique choice for the function f is given by the formula f(x) = 
(ai(x),a2(x)) € C for x € D. With this choice of f, we have a; = 10 f 
for 7 € {1,2}. The following diagram illustrates the situation. The dashed line 
labeled “=! f” indicates that there is a unique function f from D to C which 


DOI: 10.1201/9781003252139-17 185 


186 Direct Sums and Direct Products 


makes the diagram commute. We say that C satisfies the universal property 
illustrated by Diagram 17.1. 


(17.1) 


Suppose that C is another set with the same universal property as C’. That 
is, C comes with maps #; : C > Aj, and Diagram 17.1 works (for any D and 
a) with C and #; in place of C and 7;. Then by the universal property of C, 
setting D = C and a; = 7; in Diagram 17.1, we get a unique map f : C > C 
such that 7; = 1; 0 f. Likewise, using the universal property of C, we get a 
unique map g : CC such that 1; = 70g. It follows that 7 = m0 fog. 
Set h = fog, so that m = 7; 0h. 

Let (2, y) € C, and write (a,b) = h(x,y). Then we have x = m (x,y) = 
(710 fog)(x,y) = (m oh)(a,y) = m1 (h(a, y)) = m1(a,b) = a. Similarly, we 
find y = b. Therefore, h = fog =idc. 

We can arrive at the same result, fog = idc, by using the universal 
property of C alone, without taking advantage of the special form of C and 
the 7; maps as we did in the previous paragraph. To see this, put D = C 
and a; = 7; in Diagram 17.1. Then by the universal property of C’, there is 
a unique mapw : C + C such that 7;0w = 7; for i € {1,2}. But we 
know that, on the one hand, 7; 0h = 7;, while on the other hand, certainly 
m, 0idg = m;. Therefore, the uniqueness of w tells us that w = h = idc. 

The advantage of this last argument is that we can use it with the roles of 
C and C reversed, to find that go f = id@. Thus, f and g are inverse to each 
other; so f is a bijection from C to C. 

This result says that any set C satisfying the universal property of Diagram 
17.1 is essentially unique: there is a bijection from C’ to any other such set, 
which furthermore is “compatible” with the projection maps. This situation is 
typical of objects with universal properties defined by commutative diagrams: 
they are typically unique up to a unique isomorphism. Thinking in terms of 
commutative diagrams and universal properties is the mainstay of a branch 
of mathematics known as category theory. 

The beauty of the category-theoretic approach is that we can use the 
same diagram, Diagram 17.1, to define the notion of a “product” for any 
type of algebraic objects, whether groups, rings, or vector spaces; the only 
modification we must make is to require that all the maps be “morphisms” of 
the appropriate type: group homomorphisms, ring homomorphisms, or linear 


Direct Products 187 


transformations, respectively! The following definition illustrates this idea for 
rings. 


Definition 17.1. Let R, and R»2 be rings. Then a ring P is called a direct 
product of R, with Rp if there are ring homomorphisms 7; : P — R; such 
that for any ring R and any pair of ring homomorphisms a; : R — R,, there 
exists a unique ring homomorphism f : R — P which makes the following 
diagram commute: 


(17.2) 


Remark 17.2. Technically, it is the ring P together with the maps 7; which 
constitute the direct product of R, with Ro. 


Remark 17.3. It can be shown that, if it exists, a direct product of rings is 
unique up to isomorphism (Exercise 17.1); the argument is essentially the 
same as that we used above for Cartesian products of sets. What we do not 
yet know is whether a direct product of two given rings must always exist. We 
shall show next that in fact it does. 


Suppose that R; and Re are rings. We would like to invent a ring P which 
will be a direct product of Ri with Re. First, we try to identify the underlying 
set for our product ring P. Perhaps we are being simplistic, but let us try to 
set P = R, x Ro; that is, make the underlying set P equal to the Cartesian 
product of the set R, with the set Ro. We’ll also take the map 7; to be 
the ordinary projection map from R; x Rg to R;. This way, we are at least 
guaranteed the existence of a unique function f in Diagram 17.2, although 
we do not know whether we can force f to be a ring homomorphism, as we 
require. Now our task is to come up with binary operations + and - on P to 
make this happen. 

Let z,y € P. We will try to determine what the sum 7+ y =: s must 
look like. Since we require the maps 7; to be ring homomorphisms, we must 
have m;(s) = m(a + y) = 7;(x) + 7:(y). Remembering that P = R1 x Ro, we 
can write x = (x1, 22) and y = (y1, y2) for some x1, y; € R, and x2, y2 € Ro. 
Then 7;(x) = 2; and 7;(y) = y;. It follows that we have 7;(s) = x; + y;. It is 
a general fact for an element s of a Cartesian product that s = (7(s), 72(s)). 
Therefore, we have s = (a1 + y1, 22 + yp). 

We have shown that we must have (21, y1) + (#2, yo) = (@1 + yi, 2 + y2). 
The same argument will show that (11, y1) - (@2, y2) = (@1- yi, 2° y2). These 
are the only choices for the operations + and - on R, x Rpg = P which could 


188 Direct Sums and Direct Products 


possibly make P a direct product of R, with R2 using the standard projection 
maps 7;. The following result shows that our efforts have not been in vain. 


Lemma 17.4. Let R, and Rz be rings. Set P = R, x Ro, and define two 
binary operations on P by the formulas (41,72) + (yi, y2) = (41 + yi, V2 + y2) 
and (11,2) - (yi, y2) = (41° y1,%2- y2) for xi,yi € Ri. Then P is a ring, the 
standard projection maps 7; : P— R; are ring homomorphisms, and P is a 
direct product of R, with Ro. 


Proof. We leave the proof that P is a ring to the reader as Exercise 17.2. 
To see that 7; is a ring homomorphism, let a,b € P. Then we can write 
a = (a1, a2) and b = (bj, bz) for some a1, b; € Ry and ag, be € Rg. So we have 
mi(a+b) = wi ((a1, ag)+(b1, bz)) — mi((a1 +61, az+b2)) — aj, +b; — mi(a)+7;(b), 
as required. Similarly, we can see that 7;(a-b) = a; - b; = 1;(a) - 7;(b). 

Now we verify that P has the universal property of the direct product. 
Suppose that R is a ring anda; : R—- R, are two ring homomorphisms. 
Then there is a unique function f : R — P such that 70 f = a, for 
i € {1,2}, namely f : a> (ai(a),a2(a)). We must verify that f is a 
ring homomorphism. So let a,b € R. Then we have f(a +6) = (ai(a+ 
b), a2g(a+b)) = (ay (a)+a1(b), v2(a)+a9(b)) (since a; is a ring homomorphism) 
= (aj(a), a2(a)) + (a1 (0), a2(b)) (by definition of + on P) = f(a) + f(b) (by 
definition of f). Similarly, we find that f(a-b) = f(a) - f(b). This completes 
the proof. 


Remark 17.5. We refer to the operations + and - defined on P in Lemma 
17.4 as componentwise operations, just as we did in Example 13.4. For the 
categories of rings, groups, and vector spaces, it turns out that taking the 
Cartesian product of the underlying sets and defining componentwise opera- 
tions will produce a direct product. 
Notation 17.6. If R; are rings for 7 € {1,2}, then we will understand R, x Rp 
to denote the ring formed from the Cartesian product of Ri with Rg using 
componentwise operations. 
Next we generalize the idea of direct product by allowing more than two 
“factors.” 


Definition 17.7. Let C = (R;)iez be an indexed collection of rings. Then 
a ring P is called a direct product of C if there are ring homomorphisms 
tm, : P — R, fori € Z such that for any ring R and any collection of 
ring homomorphisms a; : R — R;, there exists a unique ring homomorphism 
f : R- P with the property that, for each 7 € Z, the following diagram 
commutes: 


R 


valf 
Qj P 
| 


Ty 


Ri (17.3) 


Direct Products 189 


The reader should verify that, when |Z| = 2, Definition 17.7 is equivalent 
to Definition 17.1. We shall mostly consider direct products with only finitely 
many factors; however, Definition 17.7 applies whether the index set TZ is finite 
or infinite. 

Just as in the case with only two factors, it turns out that componentwise 
operations make the Cartesian product of underlying sets into a direct product 
of rings. For ease of notation, we only treat this result for a finite direct product 
(that is, when there are only a finite number of factors): 


Lemma 17.8. Let Ri, Ro,..., Rn be rings. Then the Cartesian product P = 
Ry x Ro x---x Ry forms a ring under componentwise operations; this ring is 
a direct product of {Rihi<i<n. 


Proof. We leave the proof to the reader as Exercise 17.3. 


Notation 17.9. Let S1, S2,..., S, be sets. We denote the Cartesian product 
S, x Sy x-++ x S, by []}_, 9;. In case the $; are rings, we take [T}_, 9; to 
mean the corresponding ring with componentwise operations. 

Remark 17.10. The reader may have noticed that in Definition 17.7, the col- 
lection C is unordered, and we did not require the index set Z to be ordered 
(although there can be repetition among the R;). In Definition 17.1 as well, 
the order of the factors R; and Ry is immaterial. This can be seen from the 
symmetry of Diagram 17.2. The reader is invited in Exercise 17.4 to verify 
this fact directly. 


At this point, we give an application of direct products of rings to elemen- 
tary number theory. 


Theorem 17.11 (The Chinese Remainder Theorem). Let a1, a2, ..., Gn 
be positive integers which are pairwise relatively prime: that is, we have 
gcd(a;,a;) = 1 whenever i € j. Then there is a ring isomorphism 


f : Z/(a,+a2+++dn) — TT 2/a) (17.4) 


given by the formula 
ke -+ (ay 4 ++-an) +9 (b+ (a1), b+ (a2),--- 8 + (an) (17.5) 
fork eZ. 


Proof. For each i between 1 and n, consider the natural map 4, : Z — Z/(a;), 
k +> k+ (a;). Set a = a, + a2-++Gp. Then for each i, we have a € (a;), and 
therefore (a) C (a,;). It follows from Exercise 11.24 that 1; factors through 
Z/(a) to give a ring homomorphism a; : Z/(a) > Z/(a;), k+ (a) 4 k+ (aj). 
Set P = [J}_, Z/(a;i), and let 7; : P — Z/(a;) be the standard projection 
map. Then P is a direct product of the collection of rings {Z/(ai) }i<i<n by 
Lemma 17.8. By the universal property of the direct product, we have a ring 


190 Direct Sums and Direct Products 


homomorphism f : Z/(a) > P such that a; = 7; 0 f for each i. Now we 
must have f : k+(a) (k+(a1),...,4+(an)) for k € Z. 

Let x € ker(f), and write « = k + (a) with k € Z. Then we have f(x) = 
Op = (0+ (a1),...,0+ (Gn)). Since f(a) = (K+ (a1),..., 4 + (an)), it follows 
that k € (a;) for each i; that is, a; divides k for each 7. Because the a,;’s are 
pairwise relatively prime, this forces their product to divide k. So a | k, and 
thus k + (a) = 0+ (a), which is the zero element of the ring Z/(a). Therefore, 
ker(f) = (0). It follows from Exercise 11.23 that f is injective. 

Observe next that we have |Z/(a)| = a = [[j_, a: = TTL, |Z/(ai)| = 
ITT, Z/(a;)|. Thus, the domain and codomain of f have the same finite size. 
Since f is injective, it follows that f must also be surjective. Thus, f is an 
isomorphism. This completes the proof. 


The Chinese Remainder Theorem is sometimes described in terms of sys- 
tems of congruences. We treat this result next. 


Corollary 17.12. Let aj, a2, ..., Gn be positive integers which are pairwise 
relatively prime. Let cy, C2, .--; Cn be arbitrary integers, and set a = ay, - 
dg°++An. Then the system of congruences 


x=c, (moda), 1<i<n (17.6) 


has a unique solution x modulo a. That is, there is an integer x which satisfies 
the system of congruences (17.6), and any other integer solution y is congruent 
to x modulo a. 


Proof. Set c = (ci + (a1),..-,¢n + (Gn)) € TTL, Z/(ai). By Theorem 17.11, 
there is a unique element b € Z/(a) such that f(b) = c. Writing b = x + (a), 
we see that x is the desired solution. 


17.3. Direct Sums 


By reversing the arrows in Diagram 17.3, we get the definition of our second 
way to combine algebraic objects, the coproduct. To keep this definition gen- 
eral, we use the terms object and morphism; these can be replaced by group, 
ring, or vector space and by group homomorphism, ring homomorphism, or 
linear transformation, respectively. 


Definition 17.13. Let C = (X;)jez be an indexed collection of objects. Then 
an object S is called a coproduct of C if there are morphisms 7; : X; — S' for 
i €Z such that for any object X and any collection of morphisms a; : X;— 
X, there exists a unique morphism f : S — X with the property that, for 
each i € Z, the following diagram commutes: 


Direct Sums 191 


Xj 


[7 


ef} 


valf 
x (17.7) 


Remark 17.14. The terms object and morphism are actually technical terms 
in category theory, where we speak of the category of groups, the category of 
rings, and so on. We also speak of the category of sets, in which a morphism 
from a set X to a set Y is simply a function from X to Y. See Project 23.4 
for some details. 


Remark 17.15. Just as in the case of direct products, the notion of coproduct 
does not depend on any ordering of the collection of component objects. 


Next, we investigate whether finite coproducts exist in the category of 
groups. We start with a collection of just 2 groups, G; and G». As with 
direct products, our first candidate for a coproduct of G; with G2 will be the 
Cartesian product S = G, x G2, with multiplication defined componentwise. 
In order to proceed with the proof, we need to find a group homomorphism 
7, : G; > S for i € {1,2}. Now if a € Gj, say, then it seems natural for 
7, to map a to an element of the form (a,x2) € S; but what should x2 be? 
It seems unreasonable for x2 to depend on a at all, since the groups G and 
G2 may have nothing to do with each other. Therefore, we choose x2 to be 
a distinguished element of Gz. The only distinguished element in a generic 
group is its identity element. So we set 71(a) = (a,e2), where e; denotes the 
identity element of G;. Similarly, we define 72 : Gg — S by the formula 
T2(a) = (e€1,a) for a € Go. 

Next we check whether S satisfies the universal property of the coproduct. 
So suppose that G is a group and a; : G; > G are group homomorphisms for 
i € {1,2}. We must find a group homomorphism f : S — G with the property 
that for; = a; for each i. Let a = (a1, a2) € S. Set G; = 7;(a;). Then we require 
f(@:) = f(t:(ai)) = a;(a;) for each 7. So we need to have f(@ - G2) = f(a) - 
f(G@2) (since f is required to be a group homomorphism) = aj(a1) - a2(az). 
But G1 - G2 = (a1, e2) - (e1,d2) = (a1 - €1, €2 - a2) = (a1, a2) = a, so we must 
have f(a) = f(a1, a2) = a1(a1) - @g(a2). This formula defines a function from 
S to G, and it is the only candidate for our group homomorphism /. 

Now we ask whether this function f is indeed a group homomorphism. 
So let a,b € S with a = (a,,a2) and b = (b,,b2). Then we have f(a: 6) = 
f(ai : bi, a2° bz) = ay (a1 : bi) : Q2(a2 : bz) = ay(a1) : ay(b1) : Q2(a2) : Q2(b2) (since 
the a; are group homomorphisms). On the other hand, we have f(a) - f(0) 
= a4 (a1) -@2(a2)+a1(b1) + a2(b2). In order for f to be a group homomorphism, 
then, we must have a1(b;) - a2(a2) = a2(a2) - a1(b,) for all by € G, and all 
a2 € Go. But this is not automatically true! It is true, however, if all our groups 


192 Direct Sums and Direct Products 


are abelian. Thus, in the following result, we prove that finite coproducts exist 
in the category of abelian groups. 


Lemma 17.16. Let C = {Gi}i<i<n be a finite collection of abelian groups 
(not necessarily distinct). Set S = G, x Gg x +--+ xX Gy, the Cartesian product 
of the sets Gy, ..., Gn, with multiplication defined componentwise. Let e; 
denote the identity element of G;. For i € {1,2,...,n}, define a function 
7 : G; > S by the formula 7;(a) = (e1,...,0,-.-,@n) fora € Gi, where 
the i” component is equal to a. Then S is a group, the functions 7; are group 
homomorphisms, and S is a coproduct of the collection C in the category of 
abelian groups. 


Proof. We leave to the reader the proof that S forms an abelian group under 
componentwise multiplication (Exercise 17.10). To verify that 7; is a group 


homomorphism, let a,b € G;. Then we have 7;(a- b) = (e1,...,a°0,...,€n), 
while 7;(a) - 7;(b) = (e1,...,@,.--,€n) * (€1,.--,0,---,€n) = (€1° €1,-.-,4° 
b,..-,@n*€n) = (e1,.-.,@°b,...,€n), as required. 


Next we verify that S satisfies the universal property of the coproduct in 
the category of abelian groups. So suppose that G is an abelian group and 
a; : G; > G are group homomorphisms for 7 between 1 and n. Suppose that 
f : S— Gis a group homomorphism with the property that f o7; = a; for 
each i. Let a = (a1,..-,@n) € S. Set G; = Ti(a;) = (e1,...,@i,---,@n), where 
the a; occurs in the i*® component. Then we have f(4;) = f(ti(ai)) = ai(a;) 
for each i. So f(@1---Gn) = f(@1)--- f(Gn) (since f is a group homomorphism) 
= 01(@1)-+-Qpn(G,). But G@---G, = a, so f(a) = ay(a1)-++ n(n). This 
proves that such an f is unique if it exists. 

To finish the proof, we define the function f : S — G by the formula 
f(a) = ay(a1)-++Qn(an) for a = (a1,...,@n) € S, and show that f is in- 
deed a group homomorphism. So let a,b € S with a = (aj,...,@,) and 
b = (bi,...,bn). Then we have f(a- a = f(a, - by,...,an° bn) (by defini- 
tion of multiplication on S) = aj(a1- b1)+++An(Gn + bn) (by definition of f) 
= a1(a1)-a1(b1) +++ An (Gn) +a ah) (since the a; are group homomorphisms) 
= a1(@1) +++ Qn(Gn) - @1(b1) +++ An(bn) (since G is abelian) = f(a) - f(b). 


Notation 17.17. The group S defined in Lemma 17.16 is denoted Gj @G2-:-@ 
Gp, or 


The symbol © is read “direct sum.” Even though this is just one particu- 
lar construction of a coproduct, we often use the term direct sum to mean 
coproduct in the category of abelian groups. This should cause no confusion 
in practice. Some authors distinguish between an “external direct sum” as 
given by Equation 17.8 above, and an “internal direct sum,” by which they 
mean a group G for which the natural map of Definition 17.19 below is an 
isomorphism. 


Direct Sums 193 


Remark 17.18. We have discovered that coproducts exist in the category of 
abelian groups, where we call them direct sums; accordingly, we will use addi- 
tive notation for our group operations. Indeed, the term “direct sum” and the 
notation © already suggest that we should be doing this! On the other hand, 
we have not proved that coproducts do not exist in the category of all groups; 
see Exercise 17.13. 


Next, we turn our point of view around, and ask: under what conditions 
can we “split” a given abelian group G into a direct sum? 

First, we observe that if G = @?_,G;, then each “direct summand” G; is 
isomorphic to a subgroup of G, since the maps 7; defined in Lemma 17.16 are 
injective. Therefore, we look for direct summands of G inside of G itself. 

Suppose that G is an abelian group with subgroups Gj, ..., Gp. We have 
natural embeddings a; : G; — G for each 7, so the defining property of 
the direct sum (Definition 17.13) gives us a unique group homomorphism 
f : @%,G; - G such that fo7; = a; for each i. Observe that we must have 
f(91,---59n) = 91 +:+: +n, since f is a group homomorphism. 

The map f described in the preceding paragraph occurs often enough that 
we give it a name. 


Definition 17.19. Let (G,+) be an abelian group, and let Gi, ..., Gn be 
subgroups of G. The natural map from 67%_,G; to G is the group homomor- 
phism sending (g1,.-.,9n) to gi t-+++ Qn. 


Now that we have a map between G and a direct sum, we look for a 
condition which guarantees this map to be an isomorphism. The relevant 
condition is very similar to the condition which defines the notion of a basis 
of a vector space. 


Proposition 17.20. Let (G,+) be an abelian group, and let Gi, ..., Gn be 
subgroups of G. Then the following are equivalent: 
(i) For each g € G, there exist unique elements gi, ..., Qn, with gi € Gi, 


such that g = gi ++::+9n.- 
(ti) The natural map a : ®?_,G; > G is an isomorphism. 


Proof. (i) => (ii): Leto : @1,G; > G be the natural map. Now o must be 
injective by the uniqueness hypothesis on the g;’s, and o is surjective by the 
existence hypothesis on the g;’s. Thus, ¢ is a bijective group homomorphism, 
as required. 

(ii) => (i): This is likewise immediate from the definition of the natural 
map. 


The conclusion of Proposition 17.20 still holds if we drop the “unique” 
condition from the preceding hypothesis, and replace it with the condition 
that Og has only the trivial representation as a sum of elements from the 
G;,’s. 


194 Direct Sums and Direct Products 


Corollary 17.21. Let (G,+) be an abelian group, and let Gi, ..., G, be 
subgroups of G. Suppose that 

(i) for each g € G, there are elements gi, ..., Gr, with g; © G;, such that 
g=gt-+ Gr, and 

(ti) if gp +--+ + Gp = 0g, with g; © G;, then g; = 0 for each i. 


Then @1_,G; = G via the natural map (91,---,gr) OB gi t-+++Gr- 


Proof. This proof is very similar to the proof of Lemma 13.21 from Definition 
13.14; we leave the proof to the reader as Exercise 17.11. 


17.4 Exercises 


Exercise 17.1. Let R, and R2 be rings. Prove that a direct product of Ry with 
Ry is unique up to isomorphism: that is, any two direct products of R, with 
Ry are isomorphic. 


Exercise 17.2. Let R, and Rz be rings. Prove that R, x Rz forms a ring under 
componentwise addition and multiplication. 


Exercise 17.3. Prove Lemma 17.8. 


Exercise 17.4. Prove by exhibiting an explicit isomorphism that for any two 
rings R,; and Rez, we have R, x Re = Re x R,. Explain why this result also 
follows from Exercise 17.1. 


Exercise 17.5 (Direct Product of Groups). Let G , G2 be two groups. 

(a) What does it mean to say that a group G is a direct product of G; 
with G2? 

(b) Let G = Gi x Go, with multiplication in G defined componentwise. 
Prove that G is a direct product of G, with Go. 

(c) Suppose that G is a group with G,,G2 < G. Define a function o 
Gi x Gz > G by (91,92) © gig2. Prove that o is a group isomorphism 
if and only if all of the following conditions are satisfied: (Gj U G2) = G, 
Gi NnG2= {ec}, and G; IG forieé {1,2}. 

(d) Prove that in case G; and G» are abelian, then G; x Gy under com- 
ponentwise multiplication is also abelian, and is a direct sum of G; and G9; 
in fact, Gy x Gp = G1 © Go, and this generalizes to n factors. 


Exercise 17.6. Let Ri, Ro, ..., Ry be rings with 1. Prove that ([];_, Ri)* = 

TEs Ry m 

Exercise 17.7. Let n be a positive integer, and set R = Z/nZ. We denote |R* | 

by ¢(n). Note: ¢ is called Euler’s totient function or the Euler phi function. 
(a) Deduce from Exercise 12.17 that we have 


go(n) = |{aeZ: 1<a<nand ged(a,n) = 1}}. 


Exercises 195 


(b) Show that if p is a positive prime integer and r is a positive integer, 
then ¢(p") = p” — p""* = p""*(p— 1). 

(c) Use the Chinese Remainder Theorem and Exercises 17.5 and 17.6 to 
deduce that ifn > 1 and n = ee p;' is the prime factorization of n, then 
(Z/nZ)* = @k_,(Z/pr'Z)* via the map a+nZ+- (a+ p7Z,...,a+p;,"Z). 
Conclude that ¢(n) = []*, pt’ (pi — 1). 

(d) Prove that if m is an integer such that gcd(m,n) = 1, then we have 
m%") = 1 (mod n). This result is known as Euler’s Theorem; it is a general- 
ization of Fermat’s Little Theorem (see Exercise 12.13). 


Exercise 17.8. Use Exercise 17.7 to prove that if m and n are positive integers 
with gcd(m,n) = 1, then we have ¢(mn) = $(m) - ¢(n). 
Ezercise 17.9 (Factoring a Commutative Ring). Let R be a commutative ring 
with 1, and suppose that e € R is an element such that e? = e. 

(a) Prove that (1—e)? =1-e. 

(b) Prove that eR and (1 — e)R are commutative subrings of R which 
possess multiplicative identity elements. 

(c) Prove that R= eR x (1—e)R. 
Exercise 17.10. Prove that the set S defined in Lemma 17.16 forms an abelian 
group under componentwise multiplication. 


Exercise 17.11. Prove Corollary 17.21. 


Exercise 17.12 (Arbitrary Direct Sums). The reader may have noticed that 
the formula defining the natural map from a direct sum in Definition 17.19 
does not seem to bode well for the direct sum of an infinite collection, since 
we would need to sum an infinite series. However, things are not really so 
bad. Given an arbitrary indexed collection of abelian groups C = (G;)iez, 
let BicrG; denote the subset of [],-7 Gi where all but finitely many of the 
components are 0. Prove that 6;¢zG; is a direct sum of the collection C. Note: 
we use the notation (g;)iez to denote a typical element of the direct product 
[liez G; whose i** component is g;. This is consistent with our notation for an 
indexed collection (see Example 1.32) with indexing function g : ZI > UjerG; 
and the additional requirement that g; € G; for each 7 € TZ. 


Exercise 17.13 (Coproduct of Two Groups). Let G; and G2 be arbitrary 
groups (not necessarily abelian). In the text, we could not force Diagram 17.7 
to commute when S is the Cartesian product of G; with Gp. But it turns out 
we just didn’t try hard enough to find a suitable underlying set S. For i € 1, 2, 
let Y; = {2g : g © Gi}, where x,,; are distinct symbols. Let F; = Fr(Y¥;} be 
the free group on Y;, and let F = Fr(Y1 UY2). Let o; : F; > G; be the group 
homomorphism induced by 24; ++ g. Note that F; is a subgroup of F’, and let 
N = ((ker(o,) Uker(o2))) < F and S= F/N. 

(a) Prove that the map 7 : G; — S given by the formula gH a, iN isa 
well-defined group homomorphism. 

(b) Prove that the group S together with the maps 7; satisfy Definition 
17.13, and thus S is a coproduct of G; and Gp in the category of groups. 


196 Direct Sums and Direct Products 


(c) As mentioned in the text, we only use the term “direct sum” when 
we require all of our groups to be abelian, that is, in the category of abelian 
groups. The reader may wonder whether the group S$ constructed above will 
be isomorphic to the direct sum G, @ G2 in case both G; and G2 are abelian. 
Prove that the answer in general is no. Hint: Even if both G; are abelian, we 
can have group homomorphisms a; : G; > G where G is non-abelian. 


Exercise 17.14. Let G; and G2 be abelian groups. 

(a) Use the construction of Exercise 17.13, but with free groups replaced 
by free abelian groups, to produce an abelian group S, and prove that S is a 
coproduct of G; and G2 in the category of abelian groups. (See Exercise 10.5 
for the definition of a free abelian group.) 

(b) Prove a uniqueness result analogous to that of Exercise 17.1 in the 
category of abelian groups, and use this to prove that the group S$ from part 
(a) above is isomorphic to G1 © Go. 


18 


The Structure of Finite Abelian Groups 


18.1 Introduction 


In this chapter, we attempt to decompose finite abelian groups into direct 
sums until we cannot go further. In the end, we shall see that all finite abelian 
groups are really just things cobbled together from the group (Z,+) using 
quotients and direct sums. 


18.2 Preliminaries 


Looking back at Corollary 17.21, we would like to be able to say that the 
image of the natural map from @7_,G; to G is the “sum” of the subgroups Gy 
through G,. More generally, if we are in a group which may not be abelian, we 
would like to speak of the “product” of two subsets of the group. Therefore, 
to start our discussion, we generalize the idea of the product of a set with an 
element, by considering the product of two sets. 


Definition 18.1. Let S be a set with an associative binary operation -, and 
let A,B CS. Then we define the setwise product of A with B to be 


A-B={a-b: a€ Aand be Bh. (18.1) 


Remark 18.2. We note that in case A or B is a singleton set, this definition 
agrees with the notion of a coset: we have {a}-B=a-Band A-{b}=A-b. 


Remark 18.3. Just as with cosets, it is true that setwise products are asso- 
ciative: that is, we always have (A- B)-C = A-(B-C) for A,B,C CS. 
The proof is a straightforward consequence of the associativity of the origi- 
nal - operation. As a result, we write expressions such as A- B-C without 
parentheses. 


Remark 18.4. When G is an abelian group, we use additive notation for the 
group operation, and speak of the setwise sum A+ B:= {a+b : aeé 
A and be B}. 


DOT: 10.1201/9781003252139-18 197 


198 The Structure of Finite Abelian Groups 


It is natural to ask whether the setwise product of two subgroups must 
also be a subgroup. In general, the answer is no (see Exercise 18.1). But the 
situation is better if one of the subgroups is normal: 


Lemma 18.5. Let G be a group, and suppose that H < G and NAG. Then 
wehaveH-N=N-H andH-N<G. 


Proof. To prove the first part of the lemma, let « © H- N. Then x =a-b for 
some a € H and some b € N. So « € aN. But aN = Na by Lemma 7.25, so 
x =c-afor some c € N. Thus, « € N- H. This shows that H:N CN-H.A 
similar argument shows the opposite inclusion, so we have H-N=WN-H. 
To prove that H- N < G, we use the subgroup test. First, we have e € H 
ande € N,soe-e€ H-N. Thus H-N is non-empty. Next, let x ¢ H-N, 
and write « = a:b with a € H and b € N. Then we have 2~! = b-!- a7! 
(by Exercise 3.15). Now b~! € N and a! € H, so we have «~! € N- H. But 
N-H=H-N by the first part of our proof. Therefore, H - N is closed under 
inverses. Finally, choose y € H-N, and write y =c-dwithce H andde N. 
Then we have x-y = (a-b)-(c-d) =a-(b-c)-d. Nowb-ce N-H=H-N,so 
we can write b-c = @-b for some b € N and é € H. Therefore, x-y = a-(é-b)-d 
= (a-@)-(b-d). Since H and N are subgroups of G, we have a: @ € H and 
b-d € N. Therefore, z:-y € H-N,so H-N is closed under the group operation 
of G. This completes the proof. 


Corollary 18.6. Let G be an abelian group, and let H,, Hz be subgroups of 
G. Then the setwise sum H, + Hz is a subgroup of G. 


Proof. Since G is abelian, every subgroup of G is normal, by Exercise 11.4. 
The result then follows immediately from Lemma 18.5. Alternatively, since 
AY, + Hg is the image of the natural map from H; @ H2 to G, the result follows 
from Theorem 7.13. 


Notation 18.7. Let G be an abelian group, and let (G;)ies be an indexed 
collection of subgroups of G, where the index set S' is finite. We will use the 
notation we g G; to denote the setwise sum of the G;’s. Note that since G' is 
abelian, the order of the summands does not matter. Also note that this sum 
is a subgroup of G, by repeated application of Corollary 18.6. 


We next state a condition for an abelian group to split as a direct sum, 
which is often easier to check than that in Proposition 17.20. 


Proposition 18.8. Let G be an abelian group, and let {G; : i € S} bea 
finite collection of subgroups of G. Suppose that 

(i) For each g € G, we can write g = Diieg 9 with gi © Gi, and 

(ti) For eachi € S, we have 


Gif)| S2 G;} = {0c}, 
JES—{i} 
the trivial subgroup. 
Then the natural map ®jcesG; > G is an isomorphism. 


Splitting into p-Subgroups 199 


Proof. We will show that the g;’s in condition (i) are unique. Suppose that 
ies 9% = Vieg hi with g;, hy € Gj. Let 7 € S. Then solving for the j 
components and taking advantage of the fact that G is abelian, we have 


93-hy = SY  (Gi— hi). (18.2) 


iE S—{j} 


Now the left side of Equation 18.2 belongs to G;, while the right side belongs 
to Vies—{j} G;. By hypothesis, this forces both sides to be 0g, the identity 
element of G. Therefore, g; = h;. Since 7 was an arbitrary element of S, the 
uniqueness of the g;’s follows. Now the proposition follows from Proposition 
17.20. 


In case two subgroups are finite, there is a beautiful and useful formula for 
the size of their setwise product. 


Lemma 18.9. Let G be a group, and suppose that H; < G and |H;| < oo for 
i € {1,2}. Then we have 


|A, - H2| = |Ai|-|He| / 419 Ap]. (18.3) 
Proof. Consider the function 
Oo: A x Hy > Hy: Ha, (a,b) a-b. (18.4) 


From the definition of H; - Hz, we see that o is surjective. Let w € H,- Hg, 
and consider the set P := o~'(w). Since o is surjective, P is non-empty; let 
us fix an element (a,b) € P. Then w =a-b. 

Define a function T : H,N Hz > P by qq (a-q, gq’! - b). Suppose 
that (x,y) € P. Then we have x-y = w=a-b,andsoa!-x=b-y"!. Set 
z=a'-«. Since a,x € H, < G, we have z € Hj; similarly, z = b- y~! € Ho. 
Thus z € H,M Hy. Therefore, we have r(z) = (a- 2, z~'-b) = (2, y), so T is 
surjective. If r(w1) = T(we), then a- wi = a: We, and so w; = we. Therefore, 
T is injective. 

We now know that 7 is a bijection, and so |P| = |H,  H2|. Since P is the 
pre-image under o of an arbitrary point in H, - H2, and since a is surjective, 
the desired formula follows. 


SEs 
18.3 Splitting into p-Subgroups 


We now turn to the application of the preceding ideas to finite abelian groups. 
Our first concern is for subgroups whose order is a power of a given prime 
number, p. 


200 The Structure of Finite Abelian Groups 


Definition 18.10. Let p € N bea prime number. A p-group is a finite group 
whose order is a power of p. A p-subgroup of a group G is a p-group which is 
a subgroup of G. 


Proposition 18.11. Let G be a finite abelian group, and let p € N be a prime 
number. Then there is a unique maximal p-subgroup Gp of G. That is, there 
exists a unique subgroup Gy < G such that 

(i) Gp is a p-group, and 

(ii) If H is any p-subgroup of G, then we have H C Gp. 


Proof. The set of all p-subgroups of G is non-empty, since the trivial subgroup 
is a p-group (1 = p®). Since |G] < oo, there is some p-subgroup G, of G 
whose size is largest among all p-subgroups of G. By construction, G, satisfies 
condition (i) above. Suppose that H is a p-subgroup of G, and consider the 
setwise sum K := G, + H. Since G is abelian, Corollary 18.6 applies to give 
Kk <G. From Lemma 18.9, we see that || divides |G,| - |H|; since both G, 
and H are p-groups, it follows that K is also a p-group. Also, since 0g € H, we 
have G, C K. Thus, K is a p-subgroup of G which contains Gp; by maximality 
of |G,|, we must have K = G,. Therefore, H C G,+ H = K = Gp, so Gp 
satisfies condition (ii). 

As for uniqueness, suppose that Gp is any subgroup of G satisfying both 
(i) and (ii). Then we may substitute G, for H in condition (ii) for G, to find 
Gy C Go; reversing the roles gives Gp Cc G,. Thus, Gp = Gp, as desired. 


The groups G, turn out to be direct summands of G, as the next result 
indicates. 


Proposition 18.12. Let G be a finite abelian group, and for each prime 
number p, let Gp denote the maximal p-subgroup of G. Then we have 


G=QG,, (18.5) 


pes 


via the inverse of the natural map BpesGp — G, where S is the set of primes 
which divide the order of G. 


Proof. Let n = |G|, and for p € S, let e, be the exact power of p which divides 
n; that is, n = Les? For each prime p € S, we seek an integer n, which 
satisfies 


nN» =1 (mod p®) and (18.6) 
MN» =0 (mod gq), ifqge S and qF p. (18.7) 


By the Chinese Remainder Theorem (Corollary 17.12), there is a unique so- 
lution to this system of congruences modulo n. For each element a € G, we 
set Gp = Np + a. 

Note that we have n,p°? = 0 (mod q°) for every prime g € S, and there- 
fore n»p®? = 0 (mod n). It follows that we have p°? -a, = 0g, by Lemma 8.24. 


Structure of Abelian p-Groups 201 


Thus, the order of (ap) must divide p*, by Exercise 5.7. Therefore, (ap) is a 
p-subgroup of G, and so we have (ap) C Gp. In particular, ap € Gp. 

Next, set k = yes nq. Note that for each prime p € S, we have k = 1 
(mod p*), since k is a sum of terms of which one is congruent to 1 and the 
rest are congruent to 0 modulo p®. By the Chinese Remainder Theorem, it 
follows that k = 1 (mod n). Let v = |a| be the order of a as an element of 
G. Then v | n by Lemma 8.24, so we also have k = 1 (mod v). By Exercise 
5.7, we have k-a = 1-a = a since k = 1 (mod v). Therefore, 7g ap = 
pes Mp @=k-a=a. 

Finally, fix p € S, and set Gs = ence sit Gi. Note that the order of Ge is 
a divisor of [] qeS—{p} q°4, and hence is relatively prime to p. The intersection 


Gyn Gs is a subgroup of both G, and C. by Lemma 4.20. Therefore, its 


order, |GpM Gp), divides both |G,| and >, by Lagrange’s Theorem. Since 


these two numbers are relatively prime, it follows that GM en = {0c}. Now 
the theorem follows from Proposition 18.8. 


Corollary 18.13. Let G be a finite abelian group of ordern. Letn = Teese? 
be the prime factorization of n. Let Gp be the maximal p-subgroup of G. Then 
we have |G,| = p®. 


Proof. By Theorem 18.12, we have G = ®pegGp, and so |G| = |®pesGp| = 
II es |G,|. Since G, is a p-group, our result follows from unique factorization 


in Z 


18.4 Structure of Abelian p-Groups 


Now we know that every finite abelian group decomposes into a direct sum 
of p-groups. A natural subject of further inquiry is the structure of abelian 
p-groups. 

Currently, we have a modest set of machinery available to construct abelian 
p-groups. Namely, we can form cyclic p-groups and take direct sums. The end 
result of this construction will be a group of the form 


G= an G; (18.8) 
i=1 


where G; = (g;) and |g;| = p® for arbitrary positive integers e;. 

Given the group G, how can we recover the elements g; and obtain the 
decomposition of Equation 18.8? Let us begin with the simplest case, when 
r = 1 (we ignore the case r = 0, which yields the trivial group). In this case, 
dropping the subscripts, we have G = (g) = Zpe. 


202 The Structure of Finite Abelian Groups 


Now, we cannot quite hope to recover g given G alone, because of symmetry 
considerations: in general, G will have many different generators, related to 
each other by automorphisms of G (see Exercise 18.2). So we must be content 
to find any element which generates G. This is equivalent to finding an element 
in G of maximal order. 

We could simply ask for an element whose order is |G|; but this approach 
does not generalize to the case when G is not cyclic. Instead, we dissect G 
into equivalence classes according to the order of its elements, and ask how 
we can isolate elements at the top level of this hierarchy. A fruitful idea is the 
following: if |g| = p* where k > 0, then |pg| = p*~1. This fact can be verified 
at once: p*—!- (p-g) = p*-g = 0, so |pg| divides p*—!; but on the other hand, 
if m+ (pg) = 0 with m > 0, then (pm)g = 0, so pm > p* and m > p*-!. (It 
also follows from Exercise 18.3.) 

We claim that the subgroup (pg) =: H of G contains all elements of order 
less than p*—that is, all elements lower than the top level. Certainly, every 
element of H has order at most |H| = p*~!. On the other hand, if g €¢ G—H, 
then g =c-g for some c € N with pJ c, so |q| = p* (by Exercise 18.3). 

Finally, to isolate the elements in G of maximal order, we consider the 
quotient group G/H. A non-trivial element of G/H is a coset a+ H where 
a € G— H, and thus gives an element a of G which has maximal order. 

To generalize these ideas, we replace (pg) by the set of all p-multiples of 
elements of G: 


Notation 18.14. Let G be an abelian group and let k € Z. Then 
kG := {ka : a€ G}. 
Note that when G = (g), we have pG = (pq). We also have: 
Lemma 18.15. If G is an abelian group and k € Z, then kG <G. 


Proof. This follows from Lemma 3.44 and the Laws of Exponents (Theorem 
3.41) for groups. 


Now given an abelian p-group G, to see whether G decomposes as in Equa- 
tion 18.8, our strategy is to look at the quotient group G/pG. Since by defi- 
nition the p*® multiple of every element of G lies in pG, it follows that every 
element of G/pG has order at most p. Before stating and proving this result 
formally, we make the following definitions. 


Definition 18.16. Let G be a group. We say that G has exponent k (where 
k is a positive integer) if for every g € G we have g* = e (or, in additive 
notation, kg = 0) and k is the minimum such number. 


Definition 18.17. Let p be prime in N. An elementary abelian p-group is 
an abelian p-group of exponent at most p. 


Remark 18.18. An abelian p-group of exponent at most p must have exponent 
p or 1. In the latter case, the group must be trivial. 


Structure of Abelian p-Groups 203 


Lemma 18.19. Let G be an abelian p-group. Then G/pG is an elementary 
abelian p-group. 


Proof. First note that pG < G by Lemma 18.15, so pG < G by Exercise 
11.4. Thus the quotient G/pG =: Q is defined; and this quotient is abelian by 
Exercise 11.5. Since |G| is a power of p and |Q| = |G|/|pG|, then |Q] is also 
a power of p; so @ is a p-group. Finally, let a € Q. Then we have a=a+pG 
for some a € G. So p-a = pa+ pG. But pa € pG, so we have pa+ pG = 0+ pG 
(by Exercise 8.2) = 0g. Thus @ has exponent at most p, which completes the 
proof. 


When we use additive notation for an elementary abelian p-group G, we 
find that given any two elements a,b € G and any m € Z, we can form the 
sum a+ 6 and the product m.-a. Further, since G has exponent p, we have 
p:-a=O0 (the identity element of G). It follows from Exercise 5.7 that m-a 
only depends on m modulo p. A flash of insight shows us that it is not so 
much the integer m that is multiplying a in the expression m- a, but rather 
the coset m+ pZ, an element of Z/pZ. Now Z/pZ is a field, and so it seems we 
may be dealing with a vector space. The following result confirms this hunch. 


Lemma 18.20. Let (G,+) be an elementary abelian p-group, where p is 
prime. Let F, = Z/pZ. Then the operation: : F, x G > G given by 
(a+ pZ)-g=a-g makes G into an F,-vector space. 


Proof. The main point is to show that - is well-defined; this follows from 
Exercise 5.7. The vector space axioms follow from the Laws of Exponents, 
Theorem 3.41 and Lemma 3.44. We leave the details to the reader in Exercise 
18.5 (a). 


With vector spaces come many useful notions, including the notions of 
a basis and dimension. Exploiting the natural vector space structure of an 
elementary abelian p-group, we find below that every elementary abelian p- 
group is a direct sum of cyclic p-groups. The reader is encouraged to verify 
that the isomorphism of Corollary 18.22 below can be realized by mapping 
a basis of the vector space G to a set of generators for the terms Z, in the 
direct sum. 


Notation 18.21. For consistency and clarity of notation, when p is a positive 
prime integer we reserve Z, to stand for the additive group (Z/pZ, +) and F, 
for the field (Z/pZ,+,-). Thus, in the following result, the isomorphism is a 
group isomorphism. 


Corollary 18.22. Let G be an elementary abelian p-group of order p". Then 
G2 o@1_,Z. 


Proof. Endow G with the natural vector space structure of Lemma 18.20. 
Since G is finite as a set, and G certainly spans G as an F,-vector space, 
there must be a minimal spanning set B for G over F,. This set B must be a 


204 The Structure of Finite Abelian Groups 


basis of G over F,, by Theorem 13.23. By Exercise 13.12, we have G = Fé as 
F,,-vector spaces, where d = |B|. We must have d = r since p" = |G| = |Fé| 
= p*. Notice that (Fy, +) = Of. Zp. Since a vector space isomorphism is also 
an isomorphism of additive groups, our result follows. 


Suppose that we are given a finite direct sum of cyclic p-groups 
G = Dia). (18.9) 
i=1 


Results such as Proposition 17.20 and Corollary 17.21 lead us to think that, 
as in the case of a basis of a vector space, a set of generators {g;} in such a 
direct sum decomposition must generate the group G “efficiently.” One possible 
measure of such “efficiency” (which worked for vector space bases) is that every 
element of G should have a unique representation in the form 


C1gi + Cog2 + +++ + CrGr (18.10) 


with integer coefficents c;. Now, we cannot hope for such a condition to be 
true in a finite group, because Expression 18.10 only depends on the values c; 
modulo the order of g;. So we should restrict these values to 0 < c; < |g;|. The 
total number of such expressions is equal to the product of the |g;|. Therefore, 
a necessary condition for Equation 18.9 to hold is that the product of the 
orders of the g; is equal to the number of elements of G; note that this is a 
lower bound for the value of the product of the orders of the g; given that 
they generate G. 

The following result utilizes this approach to prove that every finite abelian 
p-group must in fact be a direct sum of the groups generated by such elements. 


Proposition 18.23. Let G be an abelian p-group. Then G is a direct sum 
of cyclic groups. Further, if g1, ..., gn are elements of G which generate G 
as a group, then the natural map o : @21(g;) > G is an isomorphism iff 


TTi=1 |g: = |Gl. 


Proof. Suppose that G is an abelian p-group. Let S = {91,...,9n} be a subset 
of G such that (S) = G, and such that the product of the orders of the g; is 
minimal. Set G; = (g;). Assume for a contradiction that there is a relation of 
the form 21 +---+ 2%, = 0g with x; € G; and x; not all 0g. 

Since x; € (g;), by Lemma 5.6 we can write x; = kjg; for some ki € 
{0,1,2,...,7; — 1}, where r; = |g;|. Since not all of the 2; are 0g, then not 
all of the k; can be 0, so there is a maximal power p! of p which divides all 
of the k;. By maximality, there is some i for which p’ || k; (note: this is read 
“nt exactly divides k;,” and means that p* divides k; but p’+! does not divide 
k;); without loss of generality, we have p’ || ky. 

Now set c; = k;/p' for all i such that 1 <i <n. Set 91 = 0"_, cigi, and 
set G@; = g; if 2 <i <n. Finally, set S= {G1,---;Gn}. Notice that p / c1, 
but |gi| is a power of p by Lagrange’s Theorem. Therefore, ged(c1, |g1|) = 1; 


Structure of Abelian p-Groups 205 


and so we have (c1gi) = (g1) by Exercise 18.3. It follows that gi € (c1gi) = 


(G1 — oye» igi) = (G1 — Doig GiGi) C (S). Since we also have g; = g; € (S) for 
i > 2, we see that $ C ($), so G = (S) C ($) and ($) = G. But in going from 
S to S we have reduced the product of the orders of the g;, since |g;| = |gi| 
for i > 2 and || < p’ < ky < |gil. 

This contradiction shows that there is no non-trivial relation among the 
G;’s, and so the natural map @7_,G; — G is an isomorphism by Corollary 


17.21. Now we have |G| = JJ'_, |Gi| = [][}_, |gi|, which finishes the proof. 


Example 18.24. Suppose that G is an abelian group of order 7°. Then by 
Proposition 18.23, G must be a direct sum of cyclic groups. There aren’t very 
many ways for this to happen, since each non-trivial cyclic direct summand 
of G must have order 7” where m € {1,2,3}, and these powers must total 3. 
Here are the possibilities: GY Z/7Z@ Z/7Z9Z/7Z or G = Z/7Z SZ/7Z or 
G = Z/7°Z. Notice that each of these possibilities really is distinct, because 
the exponents of these groups are 7, 77, and 7?, respectively, so no two of these 
groups are isomorphic. 


Generalizing this example, we see that an abelian group of order p”, where 
p is prime, corresponds to a way of writing n as a sum of positive integers. In- 
deed, ifn = yy m, with m; € Zt, then we can form the group @4_,Z/p™ Z, 
which is an abelian group of order p”; and by Proposition 18.23, every abelian 
group of order p” has this form. Because changing the order of the direct sum- 
mands does not change the result (up to isomorphism), we are free to require 
my 2M: 2-7 > Mg. 

It is natural to ask whether two different non-increasing sequences ¢; > 
“++ >; and m, > --- > m, necessarily give rise to non-isomorphic groups 
under our direct sum construction. The reader is encouraged at this point to 
experiment with examples to arrive at a conjecture about this question. 

Let us attempt to form a strategy to prove that an abelian p-group in fact 
determines a unique non-increasing exponent sequence. The idea is that this 
sequence should be somehow “intrinsic” to the group, that is, invariant under 
isomorphism. We wish to study this notion of invariance under isomorphism 
more generally and precisely. To do this, it is more convenient to talk about 
a subgroup H which is invariant under isomorphism than about sequences of 
numbers. But at most, we can hope for H to be invariant only under those 
isomorphisms which carry the group G to itself: 


Definition 18.25. Let G be a group, and let H < G. We say that H isa 
characteristic subgroup of G if for all automorphisms o of G, we have o(H) = 
A. 


Remark 18.26. An automorphism must fix a characteristic subgroup as a set, 
but not necessarily pointwise. 


Remark 18.27. The concept of a characteristic subgroup is completely separate 
from the notion of the characteristic of a commutative ring (Definition 16.21). 


206 The Structure of Finite Abelian Groups 


As a rule of thumb, we expect that any subgroup which can be defined 
“generically”—that is, in an arbitrary group (or arbitrary abelian group)— 
should be a characteristic subgroup. Examples include the center of a group 
and the commutator subgroup (Exercise 18.7). Similarly, we have: 


Lemma 18.28. Let G be an abelian group, and let k € Z. Then kG is a 
characteristic subgroup of G. 


Proof. Let o € Aut(G), and let a € kG. Then a = kg for some g € G. 
Therefore, ¢(a) = o(kg) = ko(g) (by Proposition 7.4) € kG. This shows that 
o(kG) C kG. Since o~! € Aut(G), we similarly obtain o~!(kG) C kG, so 
that kG C o(kG). Thus o(kG) = kG, as desired. 


We also expect intrinsically defined characteristic subgroups to be pre- 
served under isomorphism: 


Lemma 18.29. Let k € Z, and suppose thato : G— H is an isomorphism 
of abelian groups. Then we have o(kG) = kH. 


Proof. The proof is similar to that of Lemma 18.28. 


What does kG actually look like for an abelian group G? Consider the 
special case that G = Z/p°Z is a cyclic p-group. Then kG is cyclic, gen- 
erated by k + p®Z (see Exercise 5.6). By Exercise 18.3, we see that kG & 
Z/(p°/ gcd(k, p°))Z. In particular, when k = p and e > 1, we find that 


p(Z/p°Z) = Z/p"Z. (18.11) 


We can handle a general abelian p-group by combining this equation with the 
observation that 
k(A @ B) = (kA) © (KB) (18.12) 


for every integer k and all abelian groups A and B. 
We use these facts to prove that the exponent sequence of an abelian p- 
group is in fact isomorphism-invariant. 


Lemma 18.30. Let p be a prime number, and let ) > --- > €; and 
my, > +++ > mz be two non-increasing sequences of positive integers. Then 


@!_,Z/p'Z = ek, Z/p™Z iff j =k and £, =m, for alli. 


Proof. (<): This is trivial. (+): Set G = @!_,Z/p"Z and H = @8_,Z/p™Z. 
Let 0 : G— H be an isomorphism. Let e = S~}_, 4;. We proceed by 
induction on e. In the base case, e = 1, we have 7 = 1, 4) = 1,andG& 
Z/pZ = H, so just from comparing cardinalities, we see that k = 1 and 
m, = 1. Inductively, assume that e > 1 and that the result is true for all 
sequences with sum less than e. By Lemma 18.29 we have o(pG) = pH, 
so the restriction of o to pG gives an isomorphism pG = pH. Now pG = 
@1_,Z/p"—1Z and pH = o*_,Z/p™ —!Z, so we would like to use our inductive 
hypothesis on these smaller p-groups. The only problem is that some of the 


The Fundamental Theorem 207 


exponents may be 0 here. To address this issue, let us separate the exponent 
sequences into So := {i : @=1}, S,:= {i : & >1},T:= {i : m=1}, 
and T, := {i : m,; > 1}. The terms in the direct sums for pG and pH 
coming from So and To do not contribute anything, by Exercise 18.10. Thus 
we have pG ~ @ies, Z/p"—'Z and pH = ier, Z/p™ |Z. Now our inductive 
hypothesis applies to give S; = T, and @; -1=m, — 1 for alli € S,. Since 
G = H, we have |G| = |H], so Se Ci ean m,. We can write this as 
ies: fit Mies, fi = Viet, M+ Vier, Mi- The sums involving $+ and T} 
are equal, so we have es} f= iets m,. But since €; = 1 for all 7 € Sp and 
m, = 1 for all i € To, we have |So| = |To| and thus Sp = To. This completes 
the proof. 


18.5 The Fundamental Theorem 


Finally, we combine our results on decomposing abelian groups into a summary 
result. 


Theorem 18.31 (Fundamental Theorem of Finite Abelian Groups). Every 
finite abelian group is a direct sum of cyclic groups. If G is a finite abelian 
group and S' is the set of all primes which divide the order of G, then we have 


G= OQ (ei1G>,), (18.13) 


pes 


where Gy, is cyclic of order p°', for uniquely determined positive integers np 
and non-increasing sequences of positive integers ep. > ++: = Cp,np- 


Proof. Let G be a finite abelian group. We can decompose G into a direct 
sum of abelian p-groups by Theorem 18.12, and then decompose each of these 
factors into a direct sum of cyclic groups by Theorem 18.23, to get a repre- 
sentation of G in the form of Formula 18.13. 

Suppose we are given a representation of G as in Formula 18.13. The inner 
direct sum gives a maximal p-subgroup of the right-hand side, which must be 
isomorphic to Gp, the unique maximal p-subgroup of G, by Exercise 18.9. The 
uniqueness of the decomposition of G, follows from Lemma 18.30. 


We have now shown that every finite abelian group is a direct sum of cyclic 
groups of prime-power order. Our effort at understanding finite abelian groups 
may as well stop here, since by Exercise 18.11, there is no non-trivial way to 
decompose a cyclic p-group. 


208 The Structure of Finite Abelian Groups 


18.6 Exercises 


Exercise 18.1. Find an example of a group G and two subgroups H,, H2 of 
G such that the setwise product H, - H2 is not a subgroup of G. Hint: take 
G= $3. 


Exercise 18.2. Let G be a finite cyclic group. Suppose that G = (g) = (g) 
where g,g € G. Prove that there is a unique automorphism o € Aut(G) such 
that o(g) = §. 


Exercise 18.3. Let G be a group, and suppose that g € G is an element of 
finite order n. 

(a) Prove that we have |g*| = n/gcd(n, a) for all a € Z. 

(b) Conclude that (g*) = (g) if and only if gcd(a,n) = 1. 

(c) Show finally that the number of generators of (g) is equal to d(n). (See 
Exercise 17.7.) 


Exercise 18.4. Let G be a finite cyclic group. Prove the following converse of 
Lagrange’s Theorem for G: If m divides the order of G, then G has a subgroup 
of order m. Hint: Exercise 18.3 may be useful. (Also compare Exercise 8.11.) 


Exercise 18.5. Let G be an abelian p-group. 

(a) Prove that G/pG is naturally an F,-vector space, where F;, denotes 
the field (Z/pZ,+,-). (See Lemma 18.20.) 

(b) Prove that if {g, + pG,...,gn + pG} is a basis of G/pG over Fy, 
then the set S := {g1,...,gn} generates G as a group. (This is a version 
of a result known as Nakayama’s Lemma.) Hint: Let H = (S) and prove 
inductively that for each k € N, every element of G can be written in the 
form hy + pha + p?h3+---+p*-thy + p*a for some hy, ..., hy € H and some 
aeéG. 

(c) Prove or disprove that for every S constructed as in part (b) above, we 
must have G = @?_, (gi). 


Exercise 18.6. Let G be an abelian group and A, B < G. Prove or disprove 
that if G = A@®B, then the natural map f : A®B—- G must be an 
isomorphism. 


Exercise 18.7. Let G be a group. Prove that the following subgroups of G are 
characteristic: 

(a) The center, Z(G), of G (see Exercise 9.11). 

(b) The commutator subgroup of G (see Definition 10.8). 


Exercise 18.8. Let G be a finite abelian group with |G| = mn and gcd(m,n) = 
1. Prove that nG is the unique subgroup of G of order m. 


Exercise 18.9. Let G be a finite abelian group, and let G, denote the maximal 
p-subgroup of G. 

(a) Prove that G, is a characteristic subgroup of G. 

(b) Prove that ifo : G— H is an isomorphism, then o(G,) = Hp. 


Exercises 209 


Exercise 18.10. Prove that if G is an abelian group and T is a trivial group 
(recall this means |T| = 1), then we have G@T = G. 


Exercise 18.11. Prove that if G is a cyclic group of prime power order and 
G = A@ B for two abelian groups A and B, then either A or B is trivial. 


Exercise 18.12. Suppose that G = 67_,G;, where the G; are finite abelian 
groups. 

(a) Prove that for any element g = (g1,.--,9n) of G, we have |g| = 
lem(|g1| ,---+|9nl)- 

(b) Prove that exponent(G) = lcm(exponent(G1),...,exponent(G,)). 
Exercise 18.13. Let G be an abelian p-group of exponent p*, and let g € G 
be any element of order p°. Prove that there exists a subgroup A < G such 
that the natural map (g)@ A > G is an isomorphism. Thus, (g) is a “natural” 
direct summand of G. 


Exercise 18.14. Prove that if G is a finite abelian group whose order is divisible 
by the prime number p, then G contains an element of order p. 


Exercise 18.15. Suppose that the natural map @7_,G; — G is an isomorphism, 
where G is an abelian group and G; < G for each 7. Let H; < G; be subgroups 
of G; (1 <i < n). Prove that the natural map @7_, H; > H is an isomorphism, 
where H := >i, Hi. 


Exercise 18.16. Prove the following converse of Lagrange’s Theorem for finite 
abelian groups: If G is a finite abelian group and m divides the order of G, 
then G has a subgroup of order m. Hint: Exercises 18.4 and 18.15 may be 
useful here. 


Exercise 18.17. Let G be a finite abelian group. 

(a) Prove that G is cyclic if and only if for every prime p which divides 
the order of G, the maximal p-subgroup of G is cyclic. (G is “cyclic iff locally 
cyclic.”) 

(b) Show that the statement of part (a) is false if we remove the hypoth- 
esis that G is abelian, and we replace “the” maximal p-subgroup with every 
maximal p-subgroup. 


Exercise 18.18. A positive integer n is called squarefree if no perfect square 
greater than 1 is a factor of n. Prove that every finite abelian group of square- 
free order is cyclic. 


Exercise 18.19. For a positive integer n, let C,, denote the number of isomor- 
phism classes of abelian groups of order n. 

(a) Why must C;, be finite? 

(b) Prove that if ged(m,n) = 1, then Crnn = Cm + Cn. 

(c) Find the number of isomorphism classes of abelian groups of order 
1000000. (You do not need to list all of the isomorphism classes explicitly.) 


Exercise 18.20. Let G be the free abelian group on a set of n elements (see 
Exercise 10.5). Prove that G = @1_,Z. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


19 


Group Actions 


19.1 Groups Acting on Sets 


We have seen several examples of groups whose elements are functions, and 
whose group operation is composition of functions. Here are two examples: 

(1). Sym(S) is the group of all permutations of the set S, whose elements 
are bijective functions from S to S. If o,r € Sym(S) and x € S, then we have 
(0 -7)(x) = o(7(x)). We also have e(x) = x, where e is the identity element 
of Sym(S). 

(2). GL,(F) is the group of all invertible n x n matrices over the field 
F (see Exercises 13.15 through 13.17). Since matrices represent linear trans- 
formations, we may identify a matrix M © GL,(F) as a linear transforma- 
tion M : V — V, for a vector space V over F with dimp(V) = n and 
a fixed basis of V. Again, if M,N © GL,(F) and vu € V, then we have 
(M-N)(v) = M(N(v)), and we also have I(v) = v, where I is the identity 
element of GL, (F’). 

We will now attempt to generalize these examples. Given a group G and a 
set S, we want to define what it means for G to “act on” S. We only need to 
capture the idea that elements of G behave like functions from S$ to S under 
composition, and that the identity element of G behaves like the identity 
function. 


Definition 19.1. Let G be a group and let S be a set. A group action of G 
on $ is a function: : Gx S— S such that for all g,h € G and all x € S, we 
have 

(i) (gh) -@ =g-(h-a) and 

(ii) e- a =a. 


Example 19.2. As expected, if S is any non-empty set, then the group G = 
Sym(S) acts on S via the rule 0- x = o(x) for o € G and x € S. In fact, if 
HT < Gis any subgroup of Sym(S), then H acts on S in the same way. For 
123 4 5 
4 13 2 5 
H = (h). The reader can verify that |h| = 3, so that H = {e,h,h?}. Now 
the different elements of S are not treated equally by the action of H: the 
elements 3 and 5 always get sent to themselves when we apply any element of 
HT, whereas the elements 1, 2, and 4 get sent to each other in various ways. 


a particular instance, take S = {1,2,3,4,5}, h = , and 


DOT: 10.1201/9781003252139-19 211 


212 Group Actions 


Example 19.2 leads us to ask in general: given a group G acting on a set 
S, and an element x € S, what are the possibilities for where x can be sent 
by elements of G? 


Definition 19.3. Given an action - of a group G on a set S, and an element 
x € S, the orbit of x under the action of G is 


Orbe (2) := {g- x : gE G}. 


It only makes sense that the more group elements send z to itself, the 
smaller the orbit of x can be. The next result captures this idea nicely. 


Lemma 19.4 (Orbit-Stabilizer Lemma). Let G be a finite group acting on a 
set S. For x € S, define the stabilizer of x in G to be 


Stabe(z) :={gEeG: g-x=a}. 


Then: 
(i) Stabg(a) < G, and 
(ti) |Orbe(x)| =[G : Stabe(ax)]. 


Proof. Let x € S. 

(i). We use the subgroup test. Since e- x = x (by Definition 19.1, part (ii)), 
we have e € Stabg(x). Let g,h € Stabg(x). Then we have (g-h)-a = g:(h-a) 
(by Definition 19.1, part (i)) = g-x = x. Thus, g-h € Stabe(x). We also have 
g i-c=g7!-(g-x2) =(g-!-g)-2=e-2=2, 80 g_' € Stabg(z). 

(ii). Set T = Stabe(x). We claim that for all a,b € G, we have a- 4 =b- ax 
iff a and b belong to the same left coset of T in G. For if a,b € cT for some 
c € T, then we have a = c-t for some t € T, and so a- a = (c:t)- a =c: (t- 2) 
= c-a (since t € Stabg(z)), and similarly, b-x = c-x. Conversely, if a-a = b-2, 
then we have a~!- (a-x) = a~'(b- x), from which we see that x = (a~!-b) - a; 
hence a~!-b € T, so that both a and b belong to the coset aT’. This establishes 
a bijection from the set of all left cosets of T in G to the orbit of x under the 
action of G, given by aT +> a-«. It follows that |Orbg(x)| is the number of 
left cosets of T in G, which is by definition equal to the index of T in G, as 
desired. 


Example 19.5. Let S, h, and H be as in Example 19.2. Then the orbit of 1 
under the action of H is the set {e-1,h-1,h?-1} = {1,4,2}. The orbit of 3 
under the action of H is the set {e-3,h-3,h?-3} = {3}. The orbit of 5 under 
the action of H, similarly, is the set {5}. Notice that the orbit of 2 under this 
action is {e-2,h-2,h?-2} = {2,1,4}, which is the same as the orbit of 1. We 
extrapolate from this example as follows. 


Lemma 19.6. Suppose that - is a group action of G on S. Then the orbits of 
the elements of S' form a partition of S. 


Groups Acting on Sets 213 


Proof. Given two orbits O and O’, we must show that they are either the same 
or disjoint. So suppose that they are not disjoint, and let x € ONO’. Since O is 
an orbit, we have O = Orbe(y) for some y € S. Sincex € O= {g-y : gE Gh, 
we have 

L=a-y (19.1) 


for some a € G. Now if b € G, we have b- x = b-(a-y) = (b-a)-y € Orbg(y). 
Therefore, Orbe (x) C Orbe(y). But we can solve Equation 19.1 for y to get 
y =a_‘-a, showing that y € Orbg(zx). Thus we also get Orbg(y) C Orbe (x). 
We now can say Orbg(a) = O. But likewise, we must have Orbg(x) = O’, so 
O = O’, as desired. 

Finally, every element of S is in some orbit: namely, if « € S, then x € 
Orbg(a), since we have e- x = x. This finishes the proof. 


Suppose as in Examples 19.2 and 19.5 that h is a permutation of a finite 
set S, and « € S. Then we can form (h) =: H < Sym(S). Now H acts 
on S via a-y = a(y) for a € A and y € S. Because H is cyclic, we have 
an especially nice way to represent the orbit Orby(x). Namely, Orby(x) = 
fat Fe ts ack OT A hoe he oy. 4 et ahere tt = || = | Al, 
We note that there may be fewer than n elements in the orbit of x under H. 
But we have Staby(x) < H, so Staby(x) = (h") for some r € N with r | n 
(by Exercise 8.11). We leave it to the reader to verify that |Orby(x)| =r and 
Orby(x) = {x,h-x,h?-x,...,h"~'-x}. Supposing as we have that both x and 
h are given, then the set Orby(x) has a natural ordering given in precisely 
this fashion. 


Notation 19.7. Let S be a finite set. Let o € Sym(S), and let z € S. Then the 
cycle of o corresponding to x is the ordered r-tuple (x,0x,07z,...,07~ 12), 
where r = |Orb,e) (x)|. We interpret such r-tuples as permutations in their 
own right, as follows: if x9, ©, ..., Zp—1 are distinct elements of S, then by 
(@,%1,---,;%p—1) we mean the permutation + € Sym(S) defined by 7(a;) = 
vidi if O<i<r—1, T(a,_1) = xo, and r(y) = y for y ¢ {x0,21,..-,2r—1}. 
We say that (ao,01,...,U,-—1) is an r-cycle. 

Cycle notation is useful for understanding permutations of finite sets; see 
Exercises 19.5 through 19.7. 

The next result strengthens the connection between group actions and 
symmetric groups. 


Lemma 19.8. Let G be a group and let S be a set. Let - be a group action 
of G on S. For each g € G, letog : S > S be defined by o,(x) = g- x. 
Then og € Sym(S), and the function f : G— Sym(S), g++ og is a group 
homomorphism. The collection of all group actions of G on S is in bijective 
correspondence with the set of all group homomorphisms from G to Sym(S) 
via the map -+> f. 


Proof. See Exercise 19.1. 


An important example of group actions is given by the following result. 


214 Group Actions 


Lemma 19.9. Let G be a group. The function A : Gx G— G given by 
gAh = ghg~' is a group action of G on G, and for this group action we have 
Stabg(h) = Ce(h) (see Exercise 4.12). 

Proof. Let f,g,h € G. We have eAh = ehe~! = ehe = h and (fg)Ah = 
(fa)h(fg)-1 = fohg-*f-* = f(gAh)f-t = fA(gAh), as required. The result 
about stabilizers is immediate. 


Definition 19.10. The group action described in Lemma 19.9 is called con- 
jugation. An orbit of an element of G under conjugation is called a conjugacy 
class. 


Example 19.11. In a similar vein, if G is a group and X C G is any non- 
empty subset of G, then we can form the set of all conjugates of X, namely 
S:= {gXg"! : g € G}. G acts on S via conjugation in the obvious way: 
forg€ Gand Y € S, let gAY = gYg™t. Since Y has the form Y = hXh7+ 
for some h € G, we have gYg~' = ghXh~'g~! = (gh)X(gh)~! € S, so this 
action is well-defined. 

Example 19.12. Set F = Q and K = Q[/¥V2, i], so that K is the unique splitting 
field for f := x+ — 2 over Q in C (see Exercise 16.25). Then K/F is a finite 
Galois extension; set G = Gal(K/F). We have G < Aut(K) < Sym(4), so 
the inclusion map G <> Sym(K) is a group homomorphism. By Lemma 19.8, 
we get an action of G on K. This action is given by o-a = o(a) fora EG 
and a € K. The set of all roots of f in K is R = {i V2: 7 € {0,1,2,3}}. 
Every element of K has the form >> ¢;,4(W2)/i* with cj, € Q, where j € 
{0,1,2,3} and k € {0,1}. Let us find the orbits of various elements of K under 
the action of G. From Exercise 16.25, we know that for each j € {0,1,2,3} 
and k € {1,3}, there is an element o € G such that o(V2) = i V2 and 
a(i) = i*; and these 8 possibilities for o give all the elements of G. So, for 
example, we have Orbg(i) = {1,72} = {i, -i} and Orbg(W2) = {i V2: 7 € 
{0,1,2,3}} = {W2, iW/2, —V2, —iv/2}. Since /2 = (V2)? € K, we can 
compute Orbe(V2 + 5iW2) = {2+ 512, —V2—-5 V2, V2—-5iV/2, —V24+ 
5W2}. We note that the orbit of an element a € K under the action of G is just 
the set of all conjugates of a over F’. Further, if we know o(r) for every r € R, 
then we know o(a) for every a € K—this is because K = FR], and o must fix 
every element of F’. Notice also that the action of G on K restricts nicely to 
an action of G on R, because o(R) = R (setwise, not necessarily pointwise!) 
for any 0 € G. Thus, we want to say that there is a group homomorphism 
w: G — Sym(R), and further that w loses no information about G—i.e., 
w is an embedding. The reader is asked to prove these statements, in more 
generality, in Exercises 19.9 and 19.10. 


Our next application of group actions involves “translating” a finite Galois 
extension by adjoining a single element: 


Proposition 19.13. Let K/F be a finite Galois extension. Suppose that K= 
Ka] is a finite simple extension field of K. Set F = Fla]. Then K is a finite 
Galois extension of F, and we have Gal(K/F) @ Gal(K/F). 


Reaping the Consequences 215 


Proof. Note that K is finite over F, by Lemma 15.26. We can write K = F[R], 
where R is the set of all roots of f in K, for some polynomial f € F'[a] such 
that f splits completely in K and has no repeated roots (by Exercise 16.13 
and Theorem 15.21). Thus we have K = F[RU {a}] = F[R]. Certainly we 
have f € F[z], and R is also the set of all roots of f in K—for K < K, and 
f already factors completely over K. So we can use Exercise 16.13 again to 
conclude that K is Galois over F. Set G = Gal(K/F) and G = Gal(K/F). 
By Exercise 19.9, we have an embedding G > Sym(R) given by g +> olr, 
and likewise, G  Sym(R). Now consider the function w: G—> G, oF olK. 
By Proposition 16.25 together with Exercise 16.22, we have o(K) = K, so w 
is well-defined. It is easy to check that w is a group homomorphism. If ¢ € G 
and w(c) = eg, then we have o|« = idx, so in particular, we have o|r = idr; 
but then o = eg. Thus w is an embedding. 


19.2 Reaping the Consequences 


Next we use group actions to obtain several classical results on the structure 
of finite groups, which give partial converses to Lagrange’s Theorem. 


Proposition 19.14. Let G be a non-trivial p-group. Then |Z(G)| > 1; that 
is, the center of G is non-trivial. 


Proof. Assume for a contradiction that Z(G) = {e}. Consider the action of G 
on itself by conjugation. Let C be the set of all conjugacy classes of G, which 
are (by definition) the orbits of G under this action. One such orbit is {e}. 
If g € G — {e}, then by our assumption, g ¢ Z(G), so Ce(g) 4 G; therefore, 
|Orb(g)| = |G|/|Ce(g)| = p" for some integer r > 1. So we have |G| = 
Vsec |S] = 1+ Vsce_s.} 15] = 1 (mod p). This contradicts the fact that 
|G| =0 (mod p). Thus, our original assumption must be false, so |Z(G)| > 1, 
as desired. 


As an application of Proposition 19.14, we prove a partial converse of 
Lagrange’s Theorem for p-groups. 


Proposition 19.15. Let p be prime, and let G be a p-group. Then for all 
natural numbers m dividing the order of G, there exists a subgroup of G of 
order m. 


Proof. Fix p, and suppose that |G| = p" with r € N. We proceed by induction 
on r. 

In the base case, we have |G| = 1, and we only have to check that G 
contains a subgroup of order m = 1, which is true. 

Inductively, suppose that the result is true for all p-groups of order less than 
p’. Let m divide p". The case m = 1 being trivial, we suppose that m > 1. 


216 Group Actions 


By Proposition 19.14, Z(G) # {e}. Now Z(G) is a p-group by Lagrange’s 
Theorem, so p divides the order of Z(G). Certainly Z(G) is abelian; so by 
Exercise 18.14, Z(G) contains an element x of order p. Set H = (a). Since 
H <1G (by Exercise 9.11), we can form Q := G/H. Since Q is a p-group of 
order p’~! (by Corollary 8.19), we can find a subgroup T < Q with |T| = m/p 
by inductive hypothesis. By Exercise 7.17, we can lift T to a subgroup LD of 
G, where L = {ab : a€ G,b © H, and aH € T}. By Exercise 19.3, we have 
T ~ L/H. Therefore, |T| = |L| /|H| (by Corollary 8.19). So |Z] = |T| -|H 
= Mm, as desired. 


Theorem 19.16 (Cauchy’s Theorem). Let G be a finite group and let p be a 
prime number dividing the order of G. Then G has an element of order p. 


Proof. Fix the prime number p, and consider finite groups G whose order is 
divisible by p. We proceed by induction on |G]. In the base case, we have 
|G| = p, and so G certainly has an element of order p, because G is cyclic (by 
Theorem 8.20). 

Inductively, we may suppose that every proper subgroup of G of order 
divisible by p contains an element of order p. So we assume 


G has no proper subgroups whose order is divisible by p. (19.2) 


Consider the action of G on itself by conjugation, and let C be the set of all 
conjugacy classes of G. For every g © G— Z(G), we have Ca(g) < G, and so, 
by hypothesis (19.2), p / |Ce(g)|. Therefore, p divides |Orb(g)| if g ¢ Z(G). 
Note that the orbit of an element of Z(G) is a singleton set, so Z(G) is a union 
of orbits, and consequently G acts on G— Z(G) (see Exercise 19.8). Let D be 
the set of orbits of G, and let 8 C O be the set of orbits of G— Z(G). Then 
we have |G| = )’o¢e9 |O| = |2(G)| + Moew |O| = |Z(G)| (mod p). Since p 
divides |G|, this forces p to divide |Z(G)| as well. By hypothesis (19.2), we 
must have Z(G) = G. But this means that G is abelian, so G has an element 
of order p by Exercise 18.14. 


In Corollary 18.13, we saw that every finite abelian group of order n has 
a subgroup of order p", where p is prime and p” || n. By tweaking the argu- 
ment used to prove Cauchy’s Theorem (Theorem 19.16), we can generalize 
this result to all finite groups, including non-abelian groups. Notice that, by 
Lagrange’s Theorem, there can never be a p-subgroup of order more than 
p’, so such subgroups must be maximal p-subgroups. Unlike in the abelian 
case, however, such subgroups will not in general be unique. We note for what 


follows that Sylow is pronounced “SEE low”. 


Theorem 19.17 (Sylow’s Theorem, Weak Form). Let G be a group of order 
n < oo, and suppose that p" || n, where p is prime and r is a positive integer. 
Then G contains a subgroup of order p’. 


Proof. We fix p and proceed by induction on n. In the base case, n = p, we 
can take our subgroup to be G itself. 


Reaping the Consequences 217 


Inductively, suppose that the result is true for all finite groups of order 
less than n. 

Case 1: p divides the order of the center Z = Z(G). 

Since Z is an abelian group, there is a subgroup Y < Z such that |Y| = p‘, 
where p' || |Z| (by Corollary 18.13). Since Y < G (by Exercise 9.11 (b)), we 
can form Q = G/Y. Now p""||Q (since |Z| -|Q| = n), and t > 1, so our 
inductive hypothesis yields a subgroup T < Q with |T| = p’~*. By Exercise 
7.17, we can lift T to a subgroup L of G, where L = {ab : a€ Gjbe 
Y, and aY € T}. By Exercise 19.3, we have T = L/Y, so |L| = |T|-|Y| =p’. 
Thus, L satisfies our requirements. 

Case 2: The order of Z is prime to p, i.e., p does not divide |Z]. 

Subcase (i): p” divides the order of Ce(g) for some g € G — Z. 

Then Ce(g) 4 G, since g ¢ Z. So our inductive hypothesis gives a sub- 
group of Ce(g) of order p", which suits our purposes. 

Subcase (ii): For all g € G— Z, p" does not divide |Cg(g)|. 

Consider the action of G on itself by conjugation, and let C be the set of 
all conjugacy classes of G. 

Then p divides |Orbg(g)| for all g € G— Z. For g € Z, we have Orbe(g) = 
{g}. Let 3 be the collection of orbits of elements of Z. Then n = |G| = 
doec |O| = Noe3 lO] + Moec-3 |0] = 141 + Voec-3 0] = |Z] (mod p). 
But this is a contradiction, since p | n while p / |Z|. Thus, subcase (ii) is 
impossible. 


Definition 19.18. Let G be a finite group, and let p be a prime number. Let 
p' be the highest power of p which divides |G|. A subgroup of G of order p” 
is called a p-Sylow subgroup of G. 


Example 19.19. We demonstrate how a conjugation action can be used in 
tandem with Sylow’s Theorem to understand the structure of a group. Let G 
be a group of order 6. By Sylow’s Theorem, there is a subgroup H < G with 
|H| = 2 and a subgroup K < G with |K| = 3. Consider the action of G on 
the set S = {gHg! : g € G} by conjugation. Now Stabc(H) = Ne(H), 
the normalizer of H in G, by definition of normalizer (see Exercise 7.16). We 
know from Exercise 7.16 that H < Ne(H), so we have |Nc(H)| € {2,6}, by 
Lagrange’s Theorem. 

Case 1: |Nco(H)| = 6. Then H <1 G. We also have K <I G, by Exercise 8.7. 
Lagrange’s Theorem forces HN K = {e} and (H UK) =G. Note that H and 
K have prime order, and are thus cyclic, hence also abelian; we have H ~ Zz 
and K = Zs3. So by Exercise 17.5, we have G = Hx K = HOK = Z26Zs3, and 
G is abelian. We prefer to write G = Ze (justified, for example, by Exercise 
18.17). 

Case 2: |N@(H)| = 2. Then Ne(H) = HA and |Orbe(A)| = 6/2 = 3. 
Note that Orbg(H) = S. Let f : G— Sym(S) be the group homomorphism 
of Lemma 19.8. Then ker(f) = {g € G : gXg7! = X forall X € S} 
C Ne(H) = H. Since |H| = 2, this forces ker(f) = {ec} or ker(f) = H. Now 
ker(f) < G, while No(H) = H # G,so H & G. Therefore ker(f) = {ec}, so f 


218 Group Actions 


is an embedding. We have Sym(S') = $3 (by Exercise 9.6), and |G| = 6 = |S3|, 
so f is an isomorphism. 
We conclude that every group of order 6 is isomorphic to either Zg or $3. 
As another application of what we’ve learned up to this point, we consider 
finite extension fields of the field C of complex numbers. 


Example 19.20. Suppose that F is a finite extension of C. Since [C : R] = 2, 
then F is also finite over R. In order to be able to use Galois Theory, we will 
extend F' to a normal closure K of F over R (see Exercise 16.12). Then K is 
finite over R as well. 

Since R has characteristic zero, K is separable over R, by Proposition 
16.23. Thus, K is Galois over R. Set G = Gal(K/R), and let n = |G|. 

Let’s investigate how n might factor, using our newfound Sylow Theory. 
Since the number 2 seems to be important here, as the degree of C over R, 
let’s take a Sylow 2-subgroup H of G. Then h := |H| is the highest power of 2 
which divides |G|, so we can write n = hr, where r is odd. By the Fundamental 
Theorem of Galois Theory, we have [K : R] =nand[K : K"] =h. This 
forces [K?# : Rj =r. 

Let a € K”. Then a is algebraic over R, and we can let f = Irr(a,R, 2). 
Let d = deg(f). Now we have [Ria] : R] = d, by Lemma 15.25. Since 
R < R{a] < K”, we must have d | r, so d is also odd. 

Here we have a polynomial f € R[z] of odd degree, which is irreducible 
in R[a]. Yet we know from (pre)calculus that the values of a polynomial over 
R of odd degree must approach infinity and negative infinity as its argument 
approaches infinity and negative infinity (not necessarily respectively!). By 
the Intermediate Value Theorem, we can thus be sure that f has a root in R. 
Since f is irreducible in R[z], this forces d = 1. But this implies that a € R. 

Since a was an arbitrary element of K“”, we must have K# = R. Thus, 
r=([R : R] =1, and n=h. In other words, n is a power of 2. 

Let’s put F’ back in the picture now. We have R < C < F < K, and 
n = [Kk : R] is a power of 2; it follows that the degree [K : C] is also a power 
of 2, say [Kk : C] = 2™ where m > 0. Remembering how nicely p-groups 
behave (2-groups, in this case), and recalling from Exercise 15.15 that C has 
no extension fields of degree 2, we sense that we can say more. Indeed, by 
Lemma 16.29, K is Galois over C, and we set H = Gal(K/C) < G. Now 
|H| = 2™, so if m > 0, then H has a subgroup of index 2 (by Proposition 
19.15), which by Galois Theory corresponds to an extension field of degree 2 
over C; but this is impossible by Exercise 15.15. We conclude that m = 0, so 
K=C. Since C < F< K, this forces F = C. Thus, we have shown that C 
has no non-trivial finite extensions! 


What does Example 19.20 tell us about the field C? From field theory, 
we know that finite extension fields correspond to irreducible polynomials: 
polynomials which cannot be factored non-trivially. Thus, we have shown that 
there are essentially no such polynomials over C (that is, none except linear 
polynomials). To be precise, we make the following definition. 


Exercises 219 


Definition 19.21. A field F is algebraically closed if every polynomial in F'[a] 
factors completely over F’. 


With this definition in hand, we are ready to prove 


Theorem 19.22 (Fundamental Theorem of Algebra). C is algebraically 
closed. 


Proof. Let f € Cla]. If f € C, then f = c is a product of zero linear factors, 
so certainly f factors completely over C. 

So suppose f ¢ C. Then by Lemma 15.11, we can write f = [JV_, fi, 
where each f; is irreducible in C[a]. Now each f; must be linear, for otherwise, 
C[z]/(fi) would be a finite extension field of C of degree greater than 1 (by 
Theorem 15.10), which is impossible according to Example 19.20. By factoring 
out the leading coefficients of the f;’s, we can write f in the form required by 
Definition 15.14. 


This may be a good time to pause and reflect on what we have done. 
As its name suggests, Theorem 19.22, the Fundamental Theorem of Algebra, 
is a rather important classical result. Carl Friedrich Gauss, considered by 
many people to be one of the greatest mathematicians of all time, devoted his 
doctoral work to proving this theorem. By toying around with the ideas and 
results we proved about groups, fields, polynomials, and their relationships 
to each other (in Example 19.20), what naturally fell out was precisely this 
theorem. We are reminded of the episode of the animated series Dragon Ball® 
in which the diminutive young Goku, having been trained by the expert Master 
Roshi, competes in the World Martial Arts Competition for the first time. 
Goku’s opponent not even taking him seriously enough to face towards him in 
the ring, Goku attempts to merely get his opponent’s attention by delicately 
tapping him with a single outstretched finger; but such is Goku’s strength 
that this single finger-tap throws his opponent completely off-balance and he 
crashes down, immediately winning the match for Goku by ring-out. This is 
a good comparison to what our training in abstract algebra has done for us. 

And we are just warming up. 


19.3. Exercises 


Exercise 19.1. Let G be a group and let S be a set. Let - be a group action 
of G on S. For each g € G, let og : S — S be defined by o,(x) = g- a. 

(a) Prove that og € Sym(S). 

(b) Prove that the function f : G— Sym(S) given by g++ a, is a group 
homomorphism. 


220 Group Actions 


(c) Let A be the set of all group actions of G on S. Let H be the set of all 
group homomorphisms from G to Sym(S). Define a function tT : A—- H by 
-+» f, where f is defined as in part (b) above. Prove that 7 is a bijection. 


Exercise 19.2. Suppose that the group G acts on the non-empty set S. Define 
a relation ~ on S by « ~ y iff dg © Gs.t. g- xu = y. Prove that ~ is an 
equivalence relation on S, whose equivalence classes are just the orbits of G 
on S. 


Exercise 19.3. Let G be a group, NIG, and Q = G/N. Let T < Q, and let 
L be the lift of T to G (see Exercise 7.17). 

(a) Prove that T = L/N. 

(b) Prove that if TU < Q and M is the lift of U to G, then LM and 
U/T = M/L. 
Exercise 19.4. Let S be a finite set, and let G = Sym(S). Let xo, 1, ..., 
Xp 1 be r distinct elements of S (for some r > 1). Let o be the corresponding 
r-cycle, o = (%o,21,---,%p—1) € G. Let 7 be the cycle of o corresponding to 
xo. Prove that rT =o. 


Exercise 19.5 (Disjoint cycles commute). Let S be a finite set, and let G = 
Sym(S$). Let vo, ©1, ---, Zr—1, Yo: Y1> -++) Ys-1 be r+ 8 distinct elements of 
S (for some r,s > 1). Set o = (#0, 21,..-,@p—1) and T = (Yo, Y1,---;Ys—1)- 
Prove that or = Ta. (We say that o and 7 are disjoint cycles since for all 7, j, 
we have x; # yj.) 


Exercise 19.6 (Every permutation is the product of its disjoint cycles). Let 
S be a finite set, and let G = Sym(S). Let o € G, and let O1, ..., O, be 
the orbits of S under the action of (a). For each orbit O;, choose an element 
x; € O;. Let c; € G be the cycle of o corresponding to 2;. 

(a) Explain why the cycles c,, ..., c, are mutually disjoint. 

(b) Prove that 0 = c1c2--+-c,. We call this the disjoint cycle decomposition 
of o. 

(c) Prove that jo] = lem(|ci],...,|c¢r|), the least common multiple of the 
orders of the cycles. (Use Exercise 19.5 together with Exercise 3.14.) 


Ezercise 19.7 (Cycle structure determines conjugacy class). Let S be a finite 
set, and let G = Sym(S). 

(a) Let 7 = (%0,%1,..-,%r—1) € G be an r-cycle, and let o € G. Show 
that ora~! = (o(x0),0(x1),.--,0(@-—1)), another r-cycle. Observe that this 
formula extends to products of cycles, since the conjugation-by-o map is a 
group homomorphism. 

(b) For a permutation + € G, define the cycle structure of 7 to be the 
ordered tuple cs(r) := (|ci| ,|c2|,---,|er|), where 7 = c1c2---c, is the disjoint 
cycle decomposition of 7, arranged so that |ci| < |c2| < ---|c,|. (Note that 
we can re-arrange the c;’s as we wish, since they commute with each other.) 
Prove that for every 7 € G, the conjugacy class of 7 is 


{fata : a€G}={o€G : cs(c) =cs(7)}. 


Exercises 221 


Exercise 19.8. Suppose that a group G acts on a set S. Explain why G also 
acts on any individual orbit of an element of S, via a restriction of the original 
action. Generalize to explain why G acts on any union of orbits. 


Exercise 19.9. Let K bea finite Galois extension of F’, and let G = Gal(K/F). 
(a) Observe that G acts on K in a natural way, as a subgroup of Sym(Ix). 

Prove that the orbits of G on K are the sets Rp := {a € K : f(a) = O}, 

where f ranges over the irreducible polynomials Irr(6, F,x) for 6 € K. 

(b) From Exercise 16.13, we know that K is a splitting field over F’ for 
some polynomial f € F(x] — F without repeated roots. Let R be the set of all 
roots of f in K. Show that R is a union of orbits, and that the action of G on 
R (see Exercise 19.8) induces an embedding G — Sym(R). 

Exercise 19.10. Let K be a finite Galois extension of F’, and let a € K. Prove 
that Irr(a, Fx) = [],<c(v—7), where C is the set of all conjugates of a over 
F in K. (See Definition 16.7 and Exercise 16.7.) 


Exercise 19.11. Let a = 3W2 — 24 4iW/2 € C. Find Irr(a,Q, x) =: f. You 
may leave f in factored form. Hint: Use Exercises 19.10 and 16.25. 


Exercise 19.12. Prove that if G is a group of order p?, where p is prime, then 
G is abelian. Hint: Use Proposition 19.14. 


Exercise 19.13. Prove the following slightly stronger version of Proposition 
19.15: If G is a p-group of order p", then there is a tower of subgroups {eg} = 
Go < Gi <-+-<G,=G such that |G;| = p’ for all i € {0,1,...,r}. 
Exercise 19.14. Let G be a group, and let H < G. Let L be the set of all left 
cosets of H in G. 

(a) Prove that the operation: : Gx £L-— CL given by a- C = aC defines 
a group action of G on CL. 

(b) Now suppose in addition that [G : H] = n < oo. Show that the group 
action from part (a) above induces a group homomorphism a : G + S;, whose 
kernel is a subgroup of H. 


Exercise 19.15. In this exercise, you will prove the “full-strength” version of 
Sylow’s Theorem: 


Theorem 19.23 (Sylow’s Theorem, Strong Form). Let G be a finite group, and 
let p be a prime which divides the order of G. Let S be the set of all Sylow 
p-subgroups of G. Then: 

(i) |S| =1 (mod p). 

(ti) All elements of S' are conjugate to each other in G. 

(iti) Every p-subgroup of G is contained in some Sylow p-subgroup of G. 


(a) Prove part (i) by picking any element P € S (which we can do because 
of the Weak Sylow Theorem, Theorem 19.17), and considering the action of 
P on S by conjugation. Hint: Lemmas 18.5 and 18.9 may be useful. 

(b) Let P,Q € S. Prove that there is an element x € G such that 
Q = xPx~' by considering the action of Q by conjugation on the set of 
all conjugates of P in G. 


222 Group Actions 


(c) Let H be a p-subgroup of G. Prove that H is contained in some element 
of S by considering the action of H on S' by conjugation. 


Exercise 19.16. Let G be a finite group. 

(a) Let p be a prime which divides the order of G. Prove that if G has a 
unique Sylow p-subgroup H, then we must have H IG. 

(b) Suppose that G has a unique p-Sylow subgroup G, for each prime p 
dividing the order of G, and that each G), is cyclic. Prove that G is cyclic. 
(See Exercises 4.13 and 18.17.) 


Exercise 19.17. Let p be an odd prime number. In this exercise, you will 
classify all groups of order 2p, generalizing the result of Example 19.19 and 
incidentally proving that S3 = Dg. 

Prove that every group of order 2p is either cyclic or dihedral, by following 
the steps below: 

(a) First suppose that G is an abelian group of order 2p. Show that G 
must be cyclic. 

(b) From now on, let G be a non-abelian group of order 2p. Prove that G 
has exactly 1 element of order 1, p— 1 elements of order p, and p elements of 
order 2. 

(c) Consider the action of G by conjugation on the set of elements of G of 
order p. Prove that every orbit has size 2. 

(d) Use the previous parts of this exercise together with Exercise 10.6 to 
prove that G = Day. 


20 


Learning from Z 


20.1 Introduction 


Though we have ventured some way into abstract territory, we still have more 
to learn from the ring of ordinary integers, Z. The two notions we want to 
generalize in this chapter are fractions and unique factorization into primes. 


20.2 Fractions 


“G_d created the integers; Humans worked out all the rest.”—Leopold Kro- 
necker. 

Following Kronecker’s lead, let us imagine that we are starting with Z, 
ignorant of the field Q, but aware of the general concept of a field. How would 
we go about constructing a field around Z—that is, building Q from Z? 

Our idea is to introduce fractions. But what is a “fraction”? To construct 
a field K containing Z, we at least need an element n~! for each non-zero 
n € Z. Let us agree to use customary fraction notation, writing 1/n for n7!. 

Now we need to be able to multiply field elements, so for every pair m, 
n € Z where n # 0, our field K must contain m-(1/n). Let us agree to write 
this element as m/n. 

We still have not addressed the fact that our field K must also be closed 
under addition—and the new “summed” elements that we introduce must also 
have their inverses and products in K as well! But from our prior experience 
with Q, we anticipate that all of this will follow once we have the basic fraction 
form m/n with m,n € Z and n 4 0. 

The most subtle point—again well-known from elementary-school math- 
ematics—is that two fractions which look different can represent the same 
number. Indeed, we should have m/n = a/b iff (bn) - (m/n) = (bn) - (a/b) iff 
bm = an. This is just the usual cross-multiplication formula. 

To summarize: it takes two ordinary integers to make a fraction, but some 
pairs of integers give rise to the same fraction. Now to make this precise, we 
can say that a fraction is an equivalence class of ordered pairs (m,n) with 
m,n € Zand n 4 0, where (m,n) is equivalent to (a,b) iff an = bm. Then 


DOI: 10.1201/9781003252139-20 223 


224 Learning from Z 


we would agree to write m/n for the equivalence class of (m,n) under this 
equivalence relation. 

We proceed to apply the ideas above by replacing Z with any domain 
R. Also, instead of always inverting all the non-zero elements of R, we will 
allow ourselves the freedom to choose a select subset U of elements for use 
in our denominators. Since multiplying two fractions should multiply their 
denominators, and since we don’t want to allow division by zero, we will 
require the following property for U: 


Definition 20.1. Let R be a commutative ring with 1, and let U C R. Then 
U is called multiplicatively closed if 1€U,0¢U, and Va,b EU, a-beEU. 


Now we are ready to construct our fractions: 


Lemma 20.2. Let R be a domain, and let U be a multiplicatively closed subset 
of R. Define a relation ~ on Rx U by 


(a,u) ~ (b,v) éff av = bu. 
Then ~ is an equivalence relation. 


Proof. (i) [Show reflexivity] Let f € Rx U. We can write f = (a,u) with 
aé€ RandwéU. Then au = au, so we have (a,u) ~ (a,u), ie, f ~ f. 

(ii) [Show symmetry] Let f,g € R x U and suppose that f ~ g. Write 
f = (a,u) and g = (b,v) with a,b € R and u,v € U. Then av = bu. So 
bu = av, and therefore g ~ f. 

(iii) [Show transitivity] Let f,g,h € Rx U and suppose that f ~ g and 
g ~ h. Write f = (a,u), g = (b,v), and h = (c,w) with a,b,c € R and 
u,v,w € U. Then we have av = bu and bw = cv. Multiplying the first equation 
by w and the second by v, we find avw = buw and buw = cuv. Therefore, 
avw = cuv. We can re-write this as v- (aw — cu) = 0. Since R is a domain 
and since v € U C R — {0} (by definition of a multiplicatively closed set), we 
have aw — cu = 0, and so f ~ Ah, as required. 


Notation 20.3. We write a/u for the equivalence class of (a, vu) in Lemma 20.2, 
and we call a/u a fraction with numerator a and denominator u. 


Proposition 20.4. Let R be a domain, and let U be a multiplicatively closed 
subset of R. Then the set of all fractions a/u witha € R and u € U forms a 
domain under the operations 


(a/u) + (b/v) = (av + bu) /(uv) and (20.1) 
(a/u) - (b/v) = (ab)/(uv) (20.2) 


fora,be Randu,veU. 


Proof. Since a fraction is an equivalence class of ordered pairs of elements of R, 
we must first show that the operations + and - on fractions are well-defined. So 


Fractions 225 


suppose that a/u = @/t% and b/v = 6/% with a,a,b,b € R and u,%,v,0 € U. 
Then we have ati = Gu and bi = bv. So t®(av + bu) = (at)vi + (bd) ua 
= (Gu)vi + (bv)uti = uv(Gd + bi). Thus, (av + bu)/(uv) = (a0 + ba) /(ad). 
So + is well-defined. Also, (ab)(w) = (at%)(bd) = (Gu)(bv) = (Gb)(uv), so 
(ab) /(uv) = (ab) /(a). Thus - is also well-defined. 

Exercise 20.3 asks for a proof that these fractions form a commutative 
ring S with Os = Or/1r and lg = 1r/1r. To complete the proof of the 
proposition, first suppose that f,g € S and fg = Og. [Show f = Og or 
g = 0g.] Then we can write f = g/d and g = r/e with gr € Rand d,e €U. 
So fg = (gr)/(de) = Or/1pr. Therefore (qr) - 1g = OR: (de), so qr = Or. 
Since R is a domain by hypothesis, this forces q = Or or r = Op; without 
loss of generality, suppose the former. Then we have q- lz = 0r = 0r-d, so 
f =q/d=0r/1r = 0s. 

Finally, assume for a contradiction that 0g = 1g. Then we have 0r”/1lr = 
lr/1pr, so OR- 1p =1R- 1p, and 0g = 1p, contradicting that R is a domain. 
This completes the proof. 


Definition 20.5. Let R be a domain, and let U be a multiplicatively closed 
subset of R. The ring of fractions a/u with a € Rand u € U is called the ring 
of fractions of R with denominators from U or the localization of R at U, and 
is denoted R[U~*]. That is, we set 


R[U"") := {a/u : ae R, ue U}, 
with ring operations as given in Proposition 20.4. 
It is natural to try to identify the fraction r/1p with the element r € R. 
The following result allows us to do that. 


Lemma 20.6. Let R be a domain, and let U be a multiplicatively closed subset 
of R. Then R<> R[U~1] via the natural map r'> r/1p. 


Proof. It is easy to see that the function v : R— R[U~*] defined by v(r) = 
r/1pz is a ring homomorphism. We verify that v is injective by showing that 
its kernel is trivial (see Exercise 11.23). So suppose that r € R and v(r) = 0. 
This means that r/lrp = Or/1r, and so r-1lr = 0r- 1p. Thus r = Op, as 
desired. 


Remark 20.7. Because of Lemma 20.6, we may choose to identify R with its 
image in R[U~'] under the natural map. That is, we can think of R as a 
subring of R[U~+]. We illustrate this point of view in the following result, 
which captures the fundamental idea that localizations create more units. 


Lemma 20.8. Let R be a domain, and let U be a multiplicatively closed subset 
of R. Then, viewing U as a subset of R[U~+], we have U C (R[U~1])*. 


Proof. For u € U, we have 1p/u € R[U~'], and certainly u- (1r/u) = u/u= 
lr/|r =lin R[U-1). 


226 Learning from Z 


We want to say that the localization of R at U is the “most efficient” way to 
invert everything in U—that is, turn everything in U into a unit. Reminiscent 
of the universal property of direct products, we can express this by saying that 
the localization R[U~'] is a mandatory first stop on any route which inverts 
everything in U: 


Lemma 20.9 (Universal Property of Localization). Let R be a domain, and 
let U be a multiplicatively closed subset of R. Ifo : R—-S is a ring homo- 
morphism such that o(U) C S*, then o factors through the localization of R at 
U; more specifically, there is a unique ring homomorphism w : R[U~'] > S 
such that the following diagram commutes: 


R RU] 2 =-*3.8 
0 (20.3) 


wherev : at+>a/1p ts the natural map. 


Proof. Suppose that o : R— S is a ring homomorphism with o(U) C S*. 
Set T = R[U~']. We first suppose there is a ring homomorphism w : T > S$ 
such that wov =o. Then for any r € R, we have w(r/1r) = w(v(r)) = a(r). 
In particular, when u € U, we have w(u/1lr) = o(u). Let N = v(U). Then 
we have N C T*%, w(N) C S*, and lr € N. Let M = (N) < T™” be the 
group generated by N under multiplication. By Exercise 20.4, the restriction 
w\|a : M — S* is a group homomorphism with respect to multiplication. So 
for any fraction r/u € T with r € Rand u€ U, we have w(r/u) = w((r/1r) - 
(1r/u)) = w(r/1p)-w(1e/u) = o(r) - w((u/tr)) = o(r) - (w(u/ta)) 
=o(r)-(o(u))~'. This shows that w is uniquely determined by o. 

To finish the proof, we attempt to define a function w : T — S by 
the formula w(r/u) = o(r) - (o(u))~', and show that w is a well-defined ring 
homomorphism satisfying wov = 0. So suppose that r/u = 7/a with r,7 € R 
and u,u € U. Then we have rt = fu. So a(rt) = o(Fu). Since o is a ring 
homomorphism and since o(U) C S%*, we have o(r)o(t%) = o(7)o(u) and 
a(r)(a(u))~+ = o(F)(o(%))~+. This shows that w is well-defined. 

Next, let f,g € T, and write f = a/v and g = b/w with a,b € R and 
v,w € U. Then 


w(f +9) =w( (aw + bv)/(vw) ) 


( 
( 
= o(aw)(o(vw))7* + o(bv)(o(vw)) 7+ 
= o(a)a(w)(a(v)o(w))* + a(b)o(v)(a(v)o(w))~* 
= a(a)o(w)(o(v))*(o(w))~* + a(b)o(v)(a(v))* (aw) 
= a(a)(o(v))~* +. 0(b)(o(w)) 
= w(f) + w(9) 


Fractions 227 


And wf) = w((ad)/(ow)) = a(ab)(a(vn))-! = o(a)o(0,(o(w))-M(alee))~! 
= o(a)(a(v))~1 - o(b)(o(w))~* = w(f)w(g). Therefore, w is a ring homomor- 
phism. 

Finally, we note that since 1p € U, we have o(1p) € S*, so a restricts to 
a multiplicative group homomorphism from {lpr} to S* (by Exercise 20.4). 
Thus we have o(1r) = lg. So if r € R, then we have w(v(r)) = w(r/1r) 
=o(r)-(o(1r))~* = a(r)- (1g)~+ = o(r), so wov = 0, as desired. 


Remark 20.10. The reader is encouraged to use the universal property ex- 
pressed by Lemma 20.9 to the greatest extent possible when proving results 
about localizations, and correspondingly to avoid explicit formulas or mani- 
festations of localizations (such as the fraction representation) where possible. 
The result of Exercise 20.5 shows that this is a reasonable approach. 


The following result captures the notion that the more denominators we 
allow, the bigger the resulting ring. The extreme cases on either side occur 
when U = {1} and when U = R — {0}. 


Proposition 20.11. Let R be a domain, and let U, V be multiplicatively 
closed subsets of R with UU CV. Then we have 

(1) R[U~'] 3 R[V—!] via the map a/u a/u; 

(2) If U = {1p}, then the natural map v : R—- R[U~'], aH a/1p is a 
ring isomorphism; 


(3) If V = R— {Op}, then R[V—1] is a field. 


Proof. (1) First, identify R as a subring of both R[U~*] and R[V~+] via the 
natural maps vy : R— R[U~'] and vy : R—- R[V~+] (using Lemma 
20.6). Then we have U C V C R[V~4], so vy factors as vy = wo vy for a 
unique ring homomorphism w : R[U~!] > R[V~+] (by Lemma 20.9). We 
have w(a/1R) = w(vy(a)) = vy (a) = a/1R for a € R; so w(a/u) = w(a- ue") 
= w(a)-w(u-!) = w(a) - (w(u))~! (using Exercise 20.4) = (a/1R) - (u/1r)7! 
= a/u for a € R and u € U. Finally, we note that w is injective, since if 
a/u=0=0p/1p in R[V—], then a = Og, and so a/u = 0 in R[U~!] as well. 

(2) Let U = {1p}. We know that v is an injective ring homomorphism 
already (from Lemma 20.6). But a typical element of R[U~+] is of the form 
a/1r =v(a) with a € R, so v is also surjective. 

(3) Let V = R— {Or}. We know that R[V~1] is a domain already (from 
Proposition 20.4). Let f € R[V~'] — (0). Then we can write f = a/b with 
aé Rand be R— {Op}. Since f £0, we must have a ¥ Op, and thus b/a € 
R[V—1]. We have f - (b/a) = (ab)/(ab) = 1r/1r = 1 in R[V~']. Therefore, 
f € (R[V—1])*, as required. 


Remark 20.12. Rather than use the universal property of Lemma 20.9 to prove 
part (1) of Proposition 20.11, it may be quicker to give a more elementary 
proof. But note that when dealing with fractions, even over domains, some 
care is needed. For example, consider the fractions a/u on either side of the 
expression “a/u +> a/u” in part (1) of the proposition. These two fractions 
are, in general, very different objects! (See Exercise 20.2.) 


228 Learning from Z 


Corollary 20.13. Every domain can be embedded in a field. 


Proof. Combine Lemma 20.6 with part (3) of Proposition 20.11. 


Definition 20.14. Let R be a domain. The field R[U~'], where U = R— {0}, 
is called the field of fractions of R. 


Notation 20.15. The field of fractions of the domain R is denoted k(R). 


Example 20.16. The field of fractions of Z is Q. If we just want to invert 3, 
then the smallest set we can take for U is U = {3/ : 7 € N}. With this choice 
for U, the localization of Z at U is Z[U~+] = {a/3’ : ae Z, 7 € N}. On the 
other hand, suppose we would like to invert as much as possible except 3 by 
localizing Z at some appropriate multiplicative set V. In this case, we have to 
realize that if we include any multiple of 3, say 12, in V, then we are already 
allowing 3 as a denominator too, since 4/12 = 1/3. So we must exclude all 
multiples of 3 from V. With this in mind, we try V = Z — (3). This set V is 
indeed multiplicatively closed, since if a,b € Z—(3) then ab € Z—(3): this just 
says, taking the contrapositive, that 3 | (ab) implies 3 | a or 3 | b—which says 
precisely that 3 is prime (see Exercise 20.1). When we localize Z at this set 
V, we get the set of rational numbers whose denominators (in lowest terms) 
do not have 3 as a factor: Z[V~+] = {a/b : a,bE Z, 3/ Dd}. 


Example 20.17. Let D be a domain, and consider the polynomial ring R = 
D{z]. By Lemma 14.10, R is a domain. So we may form the field of fractions 
K of R, namely, 


Kk={f/g9 : f,g € Dia] and g #0}. 


If D is not just a domain but a field, then we have a special designation for 
this field of fractions: 


Definition 20.18. Let F be a field. Then the field of fractions of F'[z] is 
called the field of rational functions over F (in the variable x), and is denoted 


Remark 20.19. There is a very satisfactory theory of localization over any 
commutative ring with 1, though it is somewhat more complicated than the 
theory over a domain. After completing the present text, the reader may wish 
to examine [3, Chapter 2]. 


20.3. Unique Factorization 


One of the nicest and most familiar features of the ring Z is the unique fac- 
torization of integers into primes, expressed in the Fundamental Theorem of 
Arithmetic: 


Unique Factorization 229 


Theorem 20.20 (Fundamental Theorem of Arithmetic). Let n be an integer, 
n> 1. Then there exist unique prime integers 1 < py < po < ++: < py such 
that we can write n = J]; _, pi- 


As we have mentioned, the use of the order relation < (and <) does not 
carry over to general commutative rings with 1. Even in the ring Z, we have 
allowed that for every positive prime p, the number —p is just as good a 
prime. In general, the presence of units will get in the way of a truly unique 
factorization; what salvaged the situation in Z was the fact that the only 
units are 1 and —1, and we were able to recover from this by requiring all our 
factors to be greater than 1. But given the unavoidable presence of units, we 
compromise with the following definitions. 


Definition 20.21. Let R be a commutative ring with 1, and let a,b € R. We 
say that a and 0 are associates in R if Ju € R* such that a = ub. 


Definition 20.22. Let R be a domain. Then R is called a Unique Factor- 
ization Domain, or UFD, if every non-zero non-unit element of R factors into 
irreducible elements of R which are unique up to association. That is, R is 
called a UFD iff both 

(UFD1) Va € R— R* — (0), da,...,a, € R such that ai, ..., a, are 
irreducible in R and a = [Jj_, aj; and 

(UFD2) Va € R—R* —(0), if ai, ..., ay, b1,..., b, are irreducible elements 
of R such that a = []j_, a; = []j_, bi, then s = r, and, after a suitable re- 
ordering of the indices, a; is an associate of b; for every2 : 1<i<r. 


In order to see what is involved in being a UFD, we shall address properties 
(UFD1) and (UFD2) separately, and find what could go wrong. 

To violate (UFD1), there would need to be an element a € R— R* — 
(0) which does not factor into irreducibles. But then a itself must not be 
irreducible, which implies that a = a,a‘, where aj, a, € R— R* — (0); and at 
least one of a; or a, say a1, does not factor into irreducibles, or else a itself 
would too. But now a; must factor into aga5, where ay does not factor into 
irreducibles; and so on. So we get a sequence of factors a1, a2, a3, ... of a, 
where a; = a;414/,, for alli > 0; for convenience, we set ag = a. Remembering 
our old adage “To divide is to contain” (see Exercise 12.10), we can rephrase 
this in terms of ideals, as follows: we have a chain of principal ideals 


(ao) € (a1) € (ag) C--- 


in R. Moreover, because each aj,, is a non-unit and each a;+1 is non-zero, 
we can say (a;) = (a;4144,,) A (ai41), by Lemma 12.16. Thus, to violate (1), 
there must be an infinitely long strictly increasing chain of principal ideals in 
R. This motivates the following definition. 


230 Learning from Z 


Definition 20.23. Let R be a commutative ring with 1. We say that R 
satisfies the ascending chain condition on principal ideals if there is no infinite 
strictly increasing chain of principal ideals 


(ao) C (a1) C (a2) C++ 
in R; equivalently, if every ascending chain of principal ideals in R 
(ao) & (a1) € (a2) G++ 


must stabilize: that is, there must exist a number N € N such that (a;) = 
(aj41) for alla > N. 


Remark 20.24. We often express this last condition by saying that (a;) = 
(ai41) for all sufficiently large i, or for i >> 0. 


Remark 20.25. There is a closely related condition which is immensely im- 
portant in commutative algebra. Namely, a commutative ring with 1 is called 
noetherian if every ascending chain of ideals of R (not necessarily principal) 
stabilizes. Notice that the noetherian condition is stronger than the ascending 
chain condition on principal ideals; nonetheless, many important families of 
rings satisfy the noetherian condition. 

Next we investigate how part (2) of the UFD definition can fail. To violate 
(2), we would need to have irreducibles a,,...,@,, b1,...,0, in R such that 
Ij, a = TG_, bi, but at least one of the a;’s, say a1, is not an associate of 
any of the b;’s (see Exercise 20.13 for the reason we don’t need to consider 
separately the possibility that r 4 s). Now a, divides [][}_, b; in R, but we 
must not have a, | 6; in R for any i, since that would force a, and b; to be 
associates (by Exercise 20.11). This would be impossible if a, were prime (by 
Exercise 20.12). Therefore, to violate (2), there must be an irreducible element 
of R which is not prime. It turns out that the conditions we were led to in 
this discussion are both necessary and sufficient: 


Theorem 20.26. Let R be a domain. Then R is a UFD iff R satisfies the 
ascending chain condition on principal ideals and every irreducible element of 
R is prime. 


Proof. (=): Suppose R is a UFD. Assume for a contradiction that (a9) C 
(a1) C (az) C -:+ is a strictly increasing chain of principal ideals of R. Then 
for i > 1, we must have a; 4 0, since (a;) properly contains (ag); and (using 
Exercise 12.6) a; ¢ R*, since (a;) is a proper subset of R. By property (UFD1), 
we can factor each a; (for 1 > 1) into some number r; of irreducibles in R, and 
by (UFD2), r; does not depend on which factorization of a; we choose. Now 
for any i > 1, since (a;) C (ai41), we can write 


ay = fiqizi (20.4) 


for some f; € R — R* — (0). Factoring each side of Equation 20.4 into irre- 
ducibles shows that r; > r;41 for 7 > 1. Now each r; is a positive integer for 


Unique Factorization 231 


i> 1, yet we have r, > rg > rg >---, which is impossible. This contradiction 
shows that R satisfies the ascending chain condition for principal ideals. 

Next, let a be an irreducible element of R, and suppose that b,c € R and 
a | bc in R, so that (a) D (bc). Then a 4 0 and a ¢ R% (by definition of 
irreducible). [Show: a | b or a|c.] If b € R*, then we have (bc) = (c), so that 
(a) D (c), and a | c, so we are done. Similarly, we are done if c € R*. So we 
suppose that b and ¢ are non-units. Likewise, we may suppose that b,c 4 0, 
since certainly a | 0. Now we can write bc = ad with d € R, since a | bc in R; 
and we can factor b and c into irreducibles in R, by (UFD1). Write b = []j_, 6; 
and c = [];_, ¢; with b;, c; irreducible in R and r,s > 1. We must have d ¥ 0, 
since b,c £ 0 and R is a domain. Now we have 


by +++ bp ey ++ Cy = ad. (20.5) 


If d € R*, then ad is irreducible in R, by Exercise 12.3; so the number of 
irreducible factors on the left and right sides of Equation 20.5 is not the same, 
contradicting (UFD2). Therefore, d ¢ R*. So we can factor d into irreducibles; 
write d = Ly d;, where each d; is irreducible in R. Then we may apply 
(UFD2) to the equation 


by +++ bp Cy +++ Cg = a+ dy +++ dg (20.6) 


to conclude that a is an associate of one of the },’s or one of the c;’s. Therefore, 
either a | b or a | c, as desired. 

(<): The argument preceding the statement of the theorem proves this 
direction. 


We note that every field is vacuously a UFD, since there are no elements 
other than units and 0 (and thus no irreducible elements either). After fields, 
we have said that PIDs are the nicest type of ring we have studied so far; and 
Z is both a PID and a UFD; but it is still a little amazing that all PIDs enjoy 
unique factorization: 


Proposition 20.27. Every PID is a UFD. 


Proof. Let R bea PID. Suppose that (ag) © (a1) C (a2) C --- is an ascending 
chain of principal ideals of R. Set I = U72(ai). Then I is an ideal of R (by 
Exercise 20.14), so, as R is a PID, we can write J = (a) for some a € R. Now 
a € I, so (by definition of union) we must have a € (a,;) for some j. This 
forces (a) C (a;). But (a) = I, so the chain is stable after index j: that is, we 
have (a;) =I for alli > j. 

Next, suppose that a is an irreducible element of R. Let b,c € R, and 
suppose that a divides bc in R. [Show: a | b or a | c.] [Idea: We want to show 
that either a and b have a as a common factor, or else a and c have a as 
a common factor; generating an ideal using two elements should bring out a 
common factor: (gf, hf) C (f).] Let I = (a,b) be the ideal of R generated by 
a and b. Since R is a PID, we can write J = (d) for some d€ R. Nowae I 


232 Learning from Z 


by construction, so d | a in R, and we have a = dd’ for some d’ € R. Since 
a is irreducible in R, this forces d € R* or d' € R*. In the former case, we 
have I = (d) = R; in the latter case, we have (a) = (dd’) = (d) (by Lemma 
12.16) = I. Similarly, working with the ideal J = (a,c), we find that either 
J = Ror J = (a). Now if I = (a), then we have b € (a), soa |b in R, as 
desired; similarly, we are done if J = (a). So assume for a contradiction that 
I= J=R. Then 1 € I and 1 € J, so we can write (using Lemma 12.15) 
1=rja+reb = s,a+ soc for some 71, T2, $1, 82 € R. Thus we have 


1? = (ria +r2b) - (81a + 82) 
= a(r1s14 + 7182 + 1281b) + r2s2be 
€ (a), 
the last inclusion following from the fact that a divides bc in R. So 1 € (a), 


which forces (a) = R. But this implies (by Exercise 12.6) that a € R%, 
contradicting the definition of irreducible. 


Corollary 20.28. If F is a field, then the polynomial ring Fx] is a UFD. 


Proof. By Theorem 14.22, F[a] isa PID, hence F'[z] is a UFD by Proposition 
20.27. 


We can get a lot more out of Proposition 20.27 by starting with Corollary 
20.28 and combining it with our newfound knowledge that every domain is 
really part of a field (Corollary 20.13). The key results relating fractions to 
unique factorization go back to Gauss, who worked with Z and Q. But to do 
this, we first bring another familiar concept from Z to a general UFD: the 
notion of greatest common divisor, or gcd. 

Recall that for two positive integers a and b, we may define gcd(a,b) = 
max{c€ Z : c|aandc| db}. Again, this definition is not directly portable 
to a general UFD, because we cannot express the notion of max—it depends 
on the ordering in Z. But there is another (better!) way to express gcd, which 
is familiar to those who have dabbled in elementary number theory. Namely, 
we can write a and b using a common (finite) set of primes, a = []/_, p#’ 
and b = JJ;_, p;’, with d;,e; € N; note that some of the exponents may be 
zero. Then we have gcd(a, b) = [];_, PEN, For the sake of uniformity, we 
can even write the factorizations of a and b as products over all the positive 


primes, as in 
a= |] pv”, 
p : prime 
if we are willing to accept that an infinite product of 1’s is 1. 

Now in a general UFD R, given a non-zero element a € R, we want to 
write something like a = Il, p®™, where p runs over all of the prime elements 
of R, and e, is the highest power of p which divides a. Two problems emerge. 
First, we don’t want to include two primes in our factorization if they are 
associates of each other in R, any more than we want to include both 3 and 


Unique Factorization 233 


—3 in the same factorization of a positive integer. We can remove this problem 
by taking a product over all prime ideals of the form (p) instead of over all 
prime elements p of R. The second problem is that we will end up with an 
infinite product, in general. This is fine, since only finitely many exponents will 
be positive; we ignore the repeated factors of the form (p)°, since multiplying 
by 1 should have no effect. 

To make these ideas work, we first need to know how to multiply two 
ideals. 


Definition 20.29. Let R be a commutative ring with 1, and let J, J be two 
ideals of R. Then the ideal product of I and J is defined to be 


I-J:=({a-b: ael, be JS), 


the ideal of R generated by the products of elements of J with elements of 
J. We define I° := (1) = R, and we define I” recursively as I” := ["~1- I 
for n > 1. In case R is a UFD and J is principal, we also define [*° = (1), if 
I= R; and I* = (0), if J A R. (For some motivation for this last definition, 
see Exercise 20.16.) 


Warning 20.30. In general, the setwise product will be a proper subset of the 
ideal product of two ideals, even though the notation alone does not distin- 
guish between these two types of product. However, in the cases of interest to 
us—where all ideals in question are principal—then Lemma 20.32 assures us 
that the two types of product give the same result. 


Lemma 20.31. Let R be a commutative ring with 1, and let I, J be ideals of 
R. Then we have 


I-J CIN J, and 
R=PDP=IDPDPD..--DI-~. 


Proof. Ifa € I and b € J, then ab € I by (Id2) for J, and ab € J by (Id2) 
for J, soabe IN J. Since I- J is the smallest ideal which contains all such 
products, and IM J is an ideal (by Exercise 11.8), we have I: J CIN J. The 
second statement follows immediately from the first, after checking the special 
cases I1 = I and I C I” for any nEN. 


Lemma 20.32. Let R be a commutative ring with 1. Let I = (a), J = (b) 
be two principal ideals of R, with a,b € R. Then we have I. J = (ab), and 
I” = (a”) for alln € N. Furthermore, the ideal product is associative and 
commutative. 


Proof. Exercise 20.15. 


Next, we want to get our hands on the exponents in prime factorizations. 
As promised, we will work with principal prime ideals instead of prime ele- 
ments of a UFD. Thus, it makes sense to replace the maximum exponent e 
such that p* | a with the maximum exponent e such that (p)° D (a). 


234 Learning from Z 


Definition 20.33. Let R be a UFD. Let a € R—(0), and let p be a principal 
prime ideal of R. We set exp,(a) = max{e € N : p® 2 (a)}, and we call e 
the exponent of p in a. We take exp,(0) = oo. 


Remark 20.34. In moving from prime elements to principal prime ideals in 
a UFD, we are picking up exactly one new ideal, namely the zero ideal, (0). 
This follows from Exercises 12.2 and 12.12. The element 0 straddles the line 
between being prime and non-prime, as it were, but we defined 0 as a non- 
prime, while (0) is prime. 

We now formally state how we are to handle the kinds of infinite products 
of ideals which we shall encounter. 


Convention 20.35. In an infinite product of ideals of a commutative ring with 
1: 

(i) If any factor is (0), then the product is (0). 

(ii) We remove factors of the form (1); if this results in an empty product 
of ideals, then we interpret the empty product as (1). 


An indication that we are on the right track is given by the following result. 


Proposition 20.36. Let R be a UFD, and let a € R. Then we have 


(a) = |] »*, (20.7) 


pep 


where 8 is the set of all principal prime ideals of R, and ey = exp, (a). If 
a #0, then all but finitely many of the exponents are 0, and furthermore, this 
representation of (a) is unique: that is, if (a) = [[pex pl» where fy € N and 
fp =0 for all but finitely many p, then fy = exp,(a@) for all pe PB. 


Proof. If a = 0, then we have exp,(a@) = oo for all p € ‘8, from Definition 
20.33, so the product on the right-hand side of Equation 20.7 is the zero ideal 
according to Definition 20.29 and Convention 20.35 (i). So from now on, we 
suppose that a # 0. 

We next treat the case when a € R*. Let p € J. We have p # R by 
definition of prime ideal, and (a) = R by Exercise 12.6. So p Z (a), and 
exp,(a) = 0. The desired formula now follows from Convention 20.35 (ii). As 
for uniqueness, this follows from Lemma 20.31 and the fact that a prime ideal 
must be proper. 

Finally, suppse that a € R— R* — (0). By (UFD1), we can write a = 
heen a; where each a; is irreducible in R. By Exercise 20.10, the relation of 
association, ~, is an equivalence relation on R, so we can find a finite number 
of elements p1, ..., Pn € R such that each a; is an associate of exactly one 
p;. Each p, is irreducible in R by Exercise 12.3, hence p; is prime in R by 
Theorem 20.26. For 1 <i<n, sete; =|{7EN : 1<j<randa, ~p;}|. 
Then by definition of association and Lemma 12.16, we have (a;) = (p;) for 
the unique 7 such that a; ~ p;. Let p; = (p;) for i € {1,...,n}. By Exercise 


Unique Factorization 235 
12.12, we have p,; € $B. From Lemma 20.32 we have 


r n n 


(a) = [Le = [[(;) = ][@)* = [[»* = [] e*. 


j=l i=l i=l pep 


where ey = e; if p = p; for some 2, and 0 otherwise. Now that we know that 
such a formula exists, suppose that (a) = [], cs p/» where fy € N and fy = 0 
for all but finitely many p. Using Lemma 20.31, we see that for each p, we 
have pfe D (a), so exp,(a) > fp. On the other hand, if for some p we had 
p/e+! > (a), then we would have pet? | a in R, where py is a generator of 
p in R. We demonstrate that this is impossble. Assume for a contradiction 
that a = g- pitt for some g € R and q € $B. Note that we may write 
a= uU- Theeoe for some u € R*. So we have u~'gpg = Thze SO Pq 


divides Ths pe in R. Since pg is prime in R, then pg | pp in R for some 
p #q, by Exercise 20.12. So pp and qq are associates by Exercise 20.11, thus 
p = q, a contradiction. Therefore, fy = exp,(@) for all p € p. This concludes 
the proof. 


Remark 20.37. A birds’-eye view of the significance of unique factorization is 
that it allows us to turn a non-zero ring element a into a collection of integers— 
namely, the exponents in the prime factorization of (a). In this process, we lose 
the information about the units of the ring. Remembering that “a logarithm 
is an exponent,” these exponents may be viewed as “discrete logarithms.” Now 
logarithms transform multiplication into addition (recall Example 7.11). For 
a frighteningly beautiful realization of these ideas, see Exercise 20.23. 


Now we are ready to define gcd in a general UFD. 


Definition 20.38. Let a,b € R, where R is a UFD. Let $8 be the set of all 
principal prime ideals of R. For p € B, let dp and ey denote the exponent of p 
in a and b, respectively, with dp,e,p € NU {co}. The greatest common divisor 
of a and 0 in R is the ideal 


gcd(a, b) :-= II pmintdesep! CR. 
pep 
More generally, for any non-empty set A C R, define 
gcd(A) = Il pminter (a) : ac Ay 
pep 
where é€p(a) is the exponent of p in a. 


Remark 20.39. By Lemma 20.32 and Convention 20.35, every gcd is a principal 
ideal. 


Notation 20.40. We write gcd(ay, a2,...,@,) to mean gcd({aj, a2,...,An}). 


236 Learning from Z 


Definition 20.41. Let R be a UFD, and let f = ae a;x' € Rix]. The 
content of f is the principal ideal 


cont(f) := gcd(ao,a1,...,@a) C R. 


Lemma 20.42 (Gauss’ Lemma). Let R be a UFD, and let f,g € R[x]. Then 
we have cont( fg) = cont(f) - cont(g). 

Proof. Write f = Y7j2,a:0° and g = Dy) bia’ with a;,b; € R. Write 
cont(f) = (c), cont(g) = (d), and cont(fg) = (e) with c,d,e € R. The re 
sult follows directly from Exercise 20.18 (with a = 0) if f or g is 0, so we 
suppose that f,g 4 0, and consequently (by Exercise 20.22) c,d,e 4 0. Let 
p be a prime element of R, and let a = exp,)(c) and b = expy,)(d). Then 
we have p* | a; and p? | b; in R for all i. It follows that p* | f and p® | g in 
R[x]. Therefore, p**? divides fg in R[x]. So exp(,)(e) = a+ 6. On the other 
hand, let 7 = max{i : p*t! / a;} and k = max{i : p®t! / b;}. Then we 
have p*t! | a; for i > j, and p°*! | b; for i > k. Now the coefficient of 2 +* 
in fg is Cj4~ = wee a;ib;+n-i, and every term in this sum except for ajb;, 
is divisible by p*t®+!. So we can write cj, = p*t?t!- r + p%t®s for some 
r,s € Rwith p/s. It follows that p*t’t! /c;4,. Therefore, exp(p)(e) <a +b. 
We have shown that exp,,)(€) = exp(,)(c) + exp(,)(d) for every prime element 
p of R. It follows from Proposition 20.36 that (e) and (cd) have the same 
prime factorization in R, and (e) = (cd), as desired. 


Corollary 20.43. Let R be a UFD, let K = k(R) be the field of fractions of R, 
and let f € Riz]. If f = fife with fi, fo € K [a], then there exist f,, fo € Rix 
such that f; is an associate of f; in K[a] and f = fifa. 


Proof. The case f = 0 being easy, we suppose that f 4 0. Suppose that f = 
fife where fi, fo € K[ax]. There exist di,d2 € R— (0) such that d;f; € R[z]; 
for example, we can take d; to be the product of the denominators of the 
coefficients of f;. So didof = (di fi)- (do fe) is a factorization of didof in R[z]. 
By Gauss’ Lemma, together with Exercise 20.18, we have 


cont(did2f) = (di d2)cont(f) = cont (di f1) - cont (do f2). (20.8) 


Write (c;) = cont(d;f;), with c; € R, and set g; = di fi/c;. Then g € R[a] 
(by Exercise 20.24), and (did2) | (cicg) in R. So we can write cyco = cdyd2 
for some c € R. Now did2 f = (di fi) (do fe) = (c191)(C2G2) = cd, d2g91 G2, so 
f = cG1g2. We take f; = cg, and f2 = g2 to complete the proof. 


Corollary 20.44. Let R be a UFD, let K = k(R) be the field of fractions of 
R, and let f € Ria] — R. If f is irreducible in Ria], then f is irreducible in 


Proof. Suppose that f is irreducible in R[x]. Assume for a contradiction that f 
is not irreducible in K [a]. Then we can write f = fi fo for some f; € K[a]—K. 


Unique Factorization 237 


By Corollary 20.43, there are elements fi, fo € R[x] such that f; is an associate 
of f; in K[a] and f = f, fo. But since units in K|z] have degree zero, we have 
deg(f;) = deg(f;) > 1, which contradicts the irreducibility of f in R[z]. 


Proposition 20.45. If R is a UFD, then so is the polynomial ring R[a}. 


Proof. Suppose that R is a UFD, and let S = R[x]. Assume for a contradiction 
that (fo) C (f1) C (fe) C--+ is a strictly increasing chain of principal ideals 
in S. Let c; = cont(f;). Since f;,1 divides f; in S, Gauss’ Lemma yields 
co © cy © cg © :::. By Theorem 20.26, this sequence of principal ideals 
of R must stabilize; say c; = (c) for i >> 0, with c € R. Also, we have 
deg(f1) > deg(fo) > deg(fs3) >--- > 0 since fi+1 divides f; in S. Since all of 
these degrees are natural numbers, this sequence of degrees must eventually 
stabilize; say deg(f;) = d for all i >> 0. So for large enough i, we have 
fiai = a: f; with deg(a;) = 0, hence a; € R. Taking contents yields (c) = 
(ai) - (c) = (a; +c) for i >> 0. It follows that a; € R*, so (fizi1) = (fi), a 
contradiction. Therefore, S satisfies the ascending chain condition on principal 
ideals. 

Next, let f be an irreducible element of R[x]. [Show: f is prime in R[z]] 

Case 1: deg(f) = 0. Then f € R. If f = ab with a,b € R, then viewing 
a,b € R[z], by irreducibility of f in R[x] we have (without loss of generality) 
a € (R[a])* = R*. So f is also irreducible in R; hence f is prime in R by 
Theorem 20.26. Now suppose that f divides gh in Ria]. Then we can write 
gh = fw for some w € R[x]. Write cont(g) = (y) and cont(h) = (6) with 
7,6 € R. Then we have (y) - (6) = (yo) = (f) - cont(w), so f divides 76 in R. 
Without loss of generality, f | y in R. By Exercise 20.24 (a), we have f | g in 
R{z]; hence, f is prime in R[a] in this case. 

Case 2: deg(f) > 1. Let K = k(R) be the field of fractions of R. Note that 
we have R[x] < K[a]. By Corollary 20.44, f is irreducible in K[x]. Since K[2] 
is a UFD (by Corollary 20.28), then f is also prime in K [a]. Now suppose that 
g,h € Ria] and f divides gh in R[a]. Then certainly f divides gh in K[z], 
and as f is prime in K[z], then without loss of generality we may suppose 
that f divides g in K[a]. So we can write g = ft with t € K[a]. There exists 
d € R—(0) such that dt € R[z]; for example, take d to be the product of the 
denominators of the coefficients of t. Then dg = f - (dt) is a factorization of 
dg in R[x]. By Gauss’ Lemma, we have (d) - cont(g) = cont(dg) = cont(f) - 
cont(dt). Now cont(f) = (1) by Exercise 20.24(c). Therefore, (d) D cont(dt). 
It now follows from Exercise 20.24(a) that d divides dt in R[a], so t € R[z]. 
Since g = ft, we have shown that f | g in R[x]. Thus, f is prime in R[z]. So 
by Proposition 20.26, R[a] is a UFD. 


We can apply Proposition 20.45 repeatedly by starting with a UFD and 
adding one variable after another; what we get is a polynomial ring in many 
variables, or “multivariate” polynomial ring. The notation for such rings is 
what we would expect (see also Exercise 14.11): 


238 Learning from Z 


Definition 20.46. Let R be a commutative ring with 1. The polynomial ring 
over R in the variables x1, ..., @p is defined recursively as R[x1,...,0n] = 
(Riai,..-,;£n—1]) [an]. 


Corollary 20.47. If R is a UFD, then so is the polynomial ring Rixy,..., Xn}. 


Remark 20.48. The order in which we adjoin variables does not affect the 
structure of the resulting polynomial ring: we understand R[x, y] and Rly, x 
to be the same ring. It is often useful, however, to consider a formal represen- 
tation of a multivariate polynomial in terms of the order in which we adjoin 
the variables. For example, we can consider f := 2?y—2°+ay € Z[z, y] as the 
polynomial (—1)23+(y)a?-+(y)x € (Zly]) [2] or as (2?-+2)y+(—2*) € (Zla) 
Accordingly, we speak of the degree in x and the degree in y. Here, we have 
deg,,(f) = 3 and deg,(f) = 1. 


20.4 Exercises 


Exercise 20.1. Let R be a domain and let I be an ideal of R. Prove that R—I 
is a multiplicative set iff J is prime. 
Exercise 20.2. Let U = Z — (2) and V = Z — (0). Describe as explicitly as 
you can the fraction 2/3 as an element of Z[U~+] and also as an element of 
Z|V—1], using the definition of a fraction as a certain set of ordered pairs. 
In particular, show that 2/3 formally means two different things in these two 
rings. 
Exercise 20.3. Under the hypotheses of Proposition 20.4, show that the set of 
fractions a/u with a € R and u € U forms a commutative ring S under the 
given operations, with 0g = Or/1lr and lg = 1p/1R. You may use the fact 
that the operations are well-defined. 
Exercise 20.4. Let T and S be commutative rings with 1, and let 7 : TS 
be a ring homomorphism. Let N C T* be such that lp € N and r(N) C S*. 
Let M = (N) be the subgroup of T* generated by N. 

(a) Prove that 7(M) = (r(N)) < S*. 

(b) Prove that t|;y7 : M — S™* is a group homomorphism. 
Exercise 20.5. Prove that the property of localization described in Lemma 20.9 
characterizes the localization in the following sense. Let R be a domain, U a 
multiplicatively closed subset of R, and T a domain with a ring homomorphism 
7 :R—-T such that 7(U) C T*. Suppose that for every ring homomorphism 
ao : R->S such that o(U) C S*, there is a unique ring homomorphism 
w : T +S such that wor =o. Prove that T & R[U—1). 
Exercise 20.6. Let R be a domain, and let U be a multiplicatively closed 
subset of R such that U C R*. Prove that R[U~'] & R. 


Exercises 239 


Exercise 20.7. Let R be a domain, and let U, V be multiplicatively closed 
subsets of R with U C V. Prove that (R[|U~+])[V~"] = R[V—1]. 
Exercise 20.8. Let F be a field of characteristic 0. Prove that Q © F. 


Exercise 20.9. Let D be a domain, and let K = k(D) be the field of fractions 
of D. Prove that k(D[a]) = K(a). 


Exercise 20.10. Let R be a commutative ring with 1. Define a relation ~ on 
R by a ~ 6 iff a and 0 are associates in R. Prove that ~ is an equivalence 
relation. 

Exercise 20.11. Let R be a domain, and let a and b be irreducible elements 
of R. Prove that a| 6 in R iff a is an associate of b in R. 


Exercise 20.12. Let R be a commutative ring with 1 and let a be a non-zero 
non-unit of R. Prove that a is a prime element of F iff for all a1, ao, ..., an 
€ R with n> 1, we have a| (a1a9-+-a,) => Fi such that a | a;. 


Exercise 20.13. Let R be a domain, and suppose that whenever aj, ..., a, 1, 

., bs are irreducible elements of R with r,s > 1 such that []/_, ai = []}_, bi, 
then each a; is an associate of some b;. Prove by induction on r + s that we 
must have r = s under these circumstances. 


Exercise 20.14. Let R be a commutative ring with 1, and let Ip CG) C Ig © 
- be an ascending chain of ideals of R. Set J = U72o Ii. Prove that J is an 
ideal of R. 


Exercise 20.15. Prove Lemma 20.32. 


Exercise 20.16. Let R be a UFD, and let J be a proper principal ideal of R. 
Prove that N°2_,J” = (0). (This motivates the definition of [°° as (0).) Hint: 
Assume for a contradiction that 0 4 b € I” for all n € Zt, and consider the 
prime factorizations of b and of a generator a of I. 


Exercise 20.17. Let F bea field. Let R be the set of all expressions of the form 
ag + ayat/2” + aga?/2” 4... + apx"/?” with a; € F and r,n € N. Convince 
yourself that R forms a domain under the natural operations + and -, and 
prove that R does not satisfy the ascending chain condition on principal ideals. 


Exercise 20.18. Let R be a UFD. 

(a) For any a € R, prove that gcd(a) = (a); viewing a as a constant 
polynomial in R[x], conclude that cont(a) = (a). 

(b) For any non-empty set A C R, prove that gcd(A U {0}) = gced(A). 


Exercise 20.19. Let R be a UFD and let A be a non-empty subset of R. Let 
b € R bea generator for gcd(A). 

(a) Prove that if c € Rand c| a for all a € A, then (c) D ged(A) = (0). 

(b) Prove that for all a € A, we have b divides a in R. 

(c) Let A= {4@€R : a= 4b for some a € A}. Prove that ged(A) = (1). 
(Note that if b 4 0 then we can write A= {a/b : a € A}.) 

(d) Prove that if f € Ria], ce R, and c| f in R{[z], then (c) D cont(f). 


240 Learning from Z 


Exercise 20.20. Let R be a UFD, and let a € k(R). Prove that we can write 
a = n/d for some n,d € R with d # 0 and gcd(n,d) = (1). Hint: Exercise 
20.19 may help. 

Exercise 20.21 (A UFD which is not a PID). Let F be a field, and let R = 
F[x,y]. Let I = (a,y) be the ideal of R generated by the variables x and y. 
Prove that I is not principal. Hint: Assume for a contradiction that If € R 
such that (f) = J, and consider the degree of f in the variables x and y 
separately. 


Exercise 20.22. Let R be a UFD, and let A C R be a non-empty subset of R. 

(a) Prove that we have gcd(A) D (A). 

(b) Prove that gcd(A) = (A) iff (A) is a principal ideal. 

(c) Find an example to show that we may have gcd(A) 4 (A). 

Exercise 20.23. Let R be a UFD, and let K = k(R) be the field of fractions 
of R. 

(a) For a principal prime ideal p of R and a fraction f = a/b € K with 
a,b € R— (0), define exp,(f) = exp,(a@) — exp, (6). Show that this extension 
of exp is well-defined. 

(b) Prove that for every f € K, we have f € R iff exp,(f) > 0 for all 
principal prime ideals p of R. 

(c) Let 8 be the set of all non-zero principal prime ideals of R. Define 
a function w : K* — ®pep(Z, +) by f > (expy(f))pep- Prove that w is a 
surjective group homomorphism with ker(w) = R*. (See Exercise 17.12 for 
this notation.) 


Exercise 20.24. Let R be a UFD, and let f € R[x] — (0). Write cont(f) = (c) 
with ce R. Let K = k(R) be the field of fractions of R. 

(a) Note that c £ Or and 1/c € K, so that (1/c)- f € K[a], and prove 
that in fact (1/c)- f € Ria]. Conclude that c divides f in R[z]. 

(b) Observe that asc € R < K < Ka], we can form the fraction f/c € 
K (a). Convince yourself that f/c = (1/c) - f under the identification of K[2] 
as a subring of K(z). 

(c) Prove that if f is irreducible in R[x] and deg(f) > 1, then cont(f) = (1). 


Exercise 20.25 (A domain which is not a UFD). Let R = {a+biV6 : a,b € Z}. 
(a) Show that R < C. Conclude (e.g., from Exercise 11.11) that Ris a 
domain. 
(b) Show that Q[iV6] =: K is (isomorphic to) the field of fractions of R. 
(c) Show that K is a finite Galois extension of Q, and find the elements 
of G := Gal(K/Q) explicitly. 
(d) Define a function N : K > K by N(a) = [[,e¢ (a). Prove that 
Va, 3 € K, we have N(af) = N(a)N(). 
(e) Prove that in fact we have V(x) C Q. 
(f) Find an explicit formula for NV’, and prove that N(R) CN. 
(g) Prove that for all a € R, we have a € R* iff V(a) € Z*. Use this to 
show that R* = {—1, 1}. 


Exercises 241 


(h) Prove that 5 is irreducible in R but not prime in R. (The facts proved 
above are useful here, or you can prove this directly.) Conclude that R is not 
a UFD. 


Exercise 20.26. Let R be a UFD, and let K = k(R) be the field of fractions of 
R. Let F be an extension field of kK. We say that an element a € F is integral 
over R if f(a) = 0 for some non-zero monic polynomial f € R[x]. 

(a) Prove that if a is integral over R and a € K, thena€é R. 

(b) (For those who know about eigenvalues) Suppose that M is a square 
matrix with entries in Z, and someone shows you a list of purported eigenval- 
ues for M which includes the value 1.371. Can this value be exact? 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


21 


The Problems of the Ancients 


21.1 Introduction 


Until this point, our goal has been to generalize the concepts found in ordinary 
number systems, and study the resulting structures. We have only occasionally 
used these structures to shed light on our original number systems. In the 
present chapter, we shift our focus, and aim our theory squarely at certain 
questions about numbers and geometry which were first raised many centuries 
ago. 


21.2 Constructible Numbers 


Before the invention of algebra, and even before the modern (arabic) repre- 
sentation of numbers existed, there was geometry. In this section, we consider 
the problem of so-called straightedge and compass constructions: we are given 
a straight, rigid bar of length 1, together with a compass. We may think of 
our “compass” as a long string together with a pencil which can be tied to one 
end of the string. By holding the other end of the string in place, we can draw 
a circle. The question is, which real numbers occur as the lengths of the line 
segments which can be constructed in a finite number of steps? Informally, we 
define the set of constructible numbers to be 


€ = {a € R| +a can be constructed with straightedge and compass}. 


Our first goal is to wrangle the physical operations which can be performed 
by the straightedge and compass into mathematical operations. We will have 
to come to an agreement about exactly which operations are allowed. Let us 
agree on the following: 

(1) Given two points, we can draw a line segment between them—imagine 
stretching the string taut between the points and using the pencil and straight- 
edge, one unit at a time. 


DOI: 10.1201 /9781003252139-21 243 


244 The Problems of the Ancients 


eB B 
Operation (1) 


Se 


Ae A 


(2) Given that a line segment of length a can be constructed, and given a 
point P on some line segment, we can use the straightedge and compass to 
extend the line segment from the point P by a length of a units in either 
direction. 


a a 
Se Operation (2) Sr 


Se 


P P 
o—____e—e e eee 
———<— — 
a 


(3) Given two points A and B, we can place the two ends of the compass at 
these points and draw the circle with center A which passes through B. 


eB B 
Operation (3) 


Ae a’ 


(4) We can identify the point where two given line segments intersect, the 
point(s) where two given circles intersect, and the point(s) where a given 
circle intersects a given line segment, when these intersection points exist. 


Operation (4) 
i) 


C C 
A 


IE 
A 
Notice that we do not allow “random” line segments or circles to be drawn: 
all of the operations listed above assume that we are given points, circles, or 
line segments which already exist, and proceed in a deterministic fashion. But 


clearly we need to have some kind of starting place; so we allow the one-time 
use of the following operation: 


Constructible Numbers 245 


(0) To begin with, we can run the pencil along the straightedge to produce 
a line segment of length 1. 


Operation (0) 
V4 


1 


A e————+* B 
As a first illustration of what can be done with these operations, we prove 
the following useful result. 


Lemma 21.1. Given a line segment ¢ and a point P € £, we can construct a 
line segment £ perpendicular to € at P. 


Proof. Note that we must have begun our operations by constructing a line 
segment of length 1 using Operation (0). We can therefore use Operation (2) 
to construct two distinct line segments PQ; and PQ»2 which lie along @ and 
have length 1. Using Operation (4), we identify the points Q; and Qo. 


T. 


C2 Ch 


Q1 P Qe 


FIGURE 21.1: Constructing a line segment perpendicular to a given segment 
through a given point 


Next, we use Operation (3) to draw the circle C, with center Q; which 
passes through @2. Then we draw the circle C2 centered at Q2 and passing 
through Q,. The intersection points of these two circles are identifiable by 
Operation (4); choose one of them and call it T’. Finally, the line segment PT’ 
can be drawn using Operation (1), and this segment has the desired properties; 
see Figure 21.1. 


Now we return to the question of which real numbers can be constructed 
by straightedge and compass. The main insight here is that our operations are 
(basically) just linear and quadratic in nature. Algebraically, linear operations 
will allow us to do elementary arithmetic—what we now recognize as field 
operations. Quadratic operations allow us to take square roots—form quadratic 
extension fields. 


Claim 21.2. (I) 1 is constructible. 


246 The Problems of the Ancients 


(II) If a and b are constructible numbers, then so area+b, —a, and a-b; 
further, ifa #0, then 1/a is constructible too. 

(IID) Ifa is a constructible number anda > 0, then \/a is also constructible. 

(IV) All constructible numbers can be produced from the number 1 in a 
finite number of steps by using the algebraic operations of addition, negation, 
multiplication, multiplicative inverse, and square root. 


Proof. (1). Operation (0) gives us a line segment of length 1. 

(II). We treat the case when a,b > 0. We have agreed that saying —a is 
constructible means the same thing as saying that a is constructible; thus —a 
is indeed constructible. 

Starting with a line segment A, A> of length a and a line segment B, B 
of length b, we can extend A;A2 from Ag in the direction away from A, by 
length b, using Operation (2); this gives a line segment of length a + b. 

To construct a+b, take a line segment AB of length 1 and construct a 
perpendicular segment ¢ passing through B (see Figure 21.2), which is possible 
by Lemma 21.1. Since a is constructible, we may identify a point C' on £ which 
lies at distance a from B, using Operations (2) and (4). Let \ = AC. Next, 
identify the point D which lies at distance 6 from A in the direction of B. 
Draw a perpendicular A to AD through D (on the same side of AD as C). 
By extending the segments A and A, if necessary, we are guaranteed to get a 
point E where they intersect. We see that the length of the segment DE is 
a:b. 


© 
AlsB D 
b > 


FIGURE 21.2: Constructing a- b given a and b 


The construction of 1/a is left to the reader as Exercise 21.2. 

(IIL). Finally, we show how to construct \/a (see Figure 21.3, where the 
case a > 3 is illustrated). First suppose that a > 1. Since a is constructible, so 
is (a — 1)/2, by parts (I) and (II) above. So we can construct a line segment 
AC containing an intermediate point B such that both AB and BC have 
length (a—1)/2 (we use here that a > 1). Similarly, (a+ 1)/2 is constructible, 
so we can identify a point D at distance (a + 1)/2 from A in the direction 
of C, and a point E at distance (a+ 1)/2 from C in the direction of A. We 
draw the circle with center A passing through the point D, and the circle 
with center C’ passing through the point E. These two circles intersect at a 


Constructible Numbers 247 


FIGURE 21.3: Constructing ./a given a 


point F on one side of AC. The Pythagorean Theorem applied to the right 
triangle AABF shows that the length of BF is \/a. If a < 1, then we use 
the construction above to find ,/1/a, and then construct the multiplicative 
inverse of this number. 

The verification of (IV) is left as Exercise 21.3. 


Observe that if a and b are elements of some field F’, then the results of 
any of the operations in parts (I) or (II) of Claim 21.2 also lie in F’; while 
performing the square root operation in part (III) gives a result which is at 
worst in a quadratic extension of F’. Thus, we would like to assert that any 
constructible number lies inside a tower of successive quadratic extensions of 
Q, and conversely. We choose to make this assertion as a formal definition of 
constructibility: 


Definition 21.3. A real number a is constructible if there is a tower of fields 
Q = Fo — Fy < Fy Sees Fy < R such that [Fi41 : F;] = 2 for all 
i€{0,1,...,.n—-1} and ae F,. 


Lemma 21.4. If a is constructible, then [Q[a] : Q] = 2” for some r EN. 


Proof. Suppose a is constructible, and let K = Q[a]. Let F;, be as in the 
definition of constructible. Then [F,, : Q] = 2” (using Exercise 15.9 and 
Lemma 15.26). Since a € F,, we have K < F,,. Now we can say [F,, : K]-[K : 
Q] = 2”, so [K : Q] is also a power of 2. 


To determine whether a number a is constructible, it will be useful to be 
able to find its irreducible polynomial over Q, since this tells us the degree of 
Q[a] over Q. It is therefore a good time to discuss a criterion for irreducibility. 


248 The Problems of the Ancients 


Lemma 21.5 (Eisenstein’s Criterion). Let R be a domain, and let f =a" + 
Anya" + +++»+ a,x + a9 be a monic polynomial in R[x] of degree n > 1. If 
there is a prime ideal P of R such that ag,a1,...,@n—1 € P but ag ¢ P?, then 
f is irreducible in R[x]. 


Proof. Suppose that f and P are as in the hypothesis of the lemma, and 
that f = gh with g,h € Rix]. Let ¢ = deg(g) and m = deg(h). Write f = 
Dj-0 40", 9 = ya. bj), and h = DOF" 9 cj? with aj,b;,c; € Randa, = 1, 
aj € Pforj:0<j <n-—1,and ao ¢ P?. We have be-c, = 1 (since f is monic), 
so we may replace g by cng and h by beh to get g and h monic. Since ao = boco 
and ag ¢ P?, either bo ¢ P or co ¢ P; without loss of generality, suppose 
bo € P. We know now that c,, = 1 ¢ P, so it makes sense to set k = min{j | 
c; ¢ P}. The coefficient of x* in gh is ay = bocr + bice—1 +++: + bp_1¢1 + deco, 
where, as usual, we fill in missing coefficients with 0. By choice of k, we have 
Co,C1,---,Ck-1 © P. Since bo,cx ¢ P and P is prime, then boc, ¢ P. It 
follows that a, ¢ P. Therefore, k = n = deg(f). This forces deg(h) = n and 
deg(g) = 0. Since g is monic, then g = 1 € (R[a])*. Thus f is irreducible in 
Riz]. 


Now let us start to confront some of the problems of the ancients. One of 
the classic questions in geometry asks whether it is possible to “double the 
cube.” By this we mean: given a constructible number @ > 0, representing the 
side length of a cube, is it possible to construct a number d such that a cube 
with side length d has exactly twice the volume of the original cube? 


Theorem 21.6. It is impossible to double the cube using straightedge and 
compass. 


Proof. Let £ be a positive constructible number. We solve d? = 2¢° to get 
d = ¢x/2. Assume for a contradiction that d is constructible. Then the ratio 
r = d/¢ = 2 is also constructible. Now r is a root of the polynomial f(2) := 
x® — 2 € Z[a]. Using Eisenstein’s Criterion with R = Z and P = (2), we see 
that f is irreducible in Z[a]. Since Z is a UFD, then f is also irreducible in 
Q|z], by Corollary 20.44. So f = Irr(r,Q,2) by Exercise 15.2, and [Q[r] : 
Q] = 3 by Lemma 15.25. But 3 is not an integer power of 2. Thus, by Lemma 
21.4, r is not constructible. This contradiction completes the proof. 


To adapt some lines of Vergil (Aeneid IT:195—198): 


Thus abstract algebra prevailed, 
Where two thousand years of earlier math had failed. 


One of the most famous geometric problems of all time is the problem 
of trisecting an angle. We interpret this problem as follows. Given two line 
segments which meet at an angle 0, construct a pair of line segments which 
meet at angle 6/3. This question is not strictly within the realm of algebra for 
us yet; we need to rephrase it using the definition of constructible numbers. 


Constructible Regular Polygons 249 


So suppose that we have been able to construct two line segments s; = PQ 
and sy = PR which intersect at a point P, making an angle 6. Then we can 
drop a perpendicular from Q to s2, using straightedge and compass, as follows. 
First, construct a number r such that r is greater than the distance from Q 
to sg; this can always be done. Then draw the circle C with center Q@ and 
radius r. Let A and B be the intersection points of C' with the line containing 
82. Draw the two circles with centers A and B and passing through B and 
A, respectively. These circles intersect each other at two points D and FE. 
The line segment DE passes through Q and is perpendicular to sz. Let F be 
the intersection point of DE and s2. Then the number |PF'| /|s1| = cos(9) 
is constructible. Conversely, if cos(@) is constructible, then it is not hard to 
see that we can reverse this process to draw two line segments which meet at 
angle 0. This motivates the following. 


Definition 21.7. We will say that a number 6 € R is constructible as an 
angle (or, more briefly, that the angle 6 is constructible) if the number cos(@) 
is constructible. 


Theorem 21.8. It is impossible to trisect the angle with straightedge and 
compass. More precisely, there exists a constructible angle 0 such that the 
angle 0/3 is not constructible. 


Proof. Let 6 = 60° = 7/3 radians. Then cos(@) = 1/2. We recall the trigono- 
metric identity 
cos(3¢) = 4cos*(¢) — 3 cos(#). 


(See Exercise 21.7 if you are not familiar with this identity.) Substituting 7/9 
for ¢ gives cos(7/3) = 4cos?(7/9) — 3cos(m/9). Therefore, cos(7/9) is a root 
of the polynomial g := 42° — 3x — 1/2. Let h = 2g = 843 — 62 —1 € Za]. 
Unfortunately, when we divide h by 8 to get a monic polynomial, the result 
does not have integer coefficients, so we cannot apply Eisenstein’s Criterion 
directly. In order to use Eisenstein’s Criterion, we employ a trick to get a nice 
monic polynomial in Z[z]. Namely, set a = cos(7/9) and 6 = 2a — 1. From 
the equation 8a? — 6a — 1 = 0, we see that (G + 1)? — 3(6 +1) —1 =0, so 
6 is a root of f := a3 + 3x? — 3. Now f is irreducible in Z[a] by Eisenstein’s 
Criterion, so as in the proof of Theorem 21.6, we see that [Q/3] : Q] = 3. 
Notice that Q[S] = Q|a]. Therefore, a is not a constructible number, hence 
m/9 is not a constructible angle. 


21.3. Constructible Regular Polygons 


Next we turn to the subject of polygons. Recall that a polygon is called regular 
if all its sides have the same length. We will ask which regular polygons can 
be constructed by straightedge and compass. 


250 The Problems of the Ancients 


Suppose that P is a regular polygon with n sides (n > 3); we call Pa 
regular n-gon. Then P has n vertices v1, ..., Un; we index the vertices in 
counterclockwise order. These vertices lie on a circle C. Let s; be the line 
segment from the center of C' to the point v;. Then the angle between s; and 
8:41 (in radians) is 27/n. If we can construct this angle, then we can draw a 
regular n-gon, and conversely. This motivates the following. 


Definition 21.9. Let n € N, n > 3. To say that we can construct a regular 
n-gon with straightedge and compass means that 27/n is constructible as an 
angle. 


Thus, we want to know for which n the number cos(27/n) is constructible. 
This amounts to asking about the structure of the field extension Q|[cos(6)]/Q, 
where 6 = 27/n. In turn, this involves understanding the polynomials satisfied 
by cosines—in other words, trigonometric identities. 

We saw in our study of the angle trisection problem that we can express 
cos(3a@) as a polynomial of degree 3 in cos(qa). It would seem that some kind 
of generalized “multiple-angle formula” could help us here, since cos(n@) = 
cos(277) = 1 is a nice number, and we expect cos(n@) to be a polynomial in 
cos(@). While this is true, it is also true that the multiple-angle formulas grow 
more and more complicated as the multiple, n, increases. In order to simplify 
the tangle of equations and achieve a deeper understanding of trigonometric 
identities, we recall the formula 


e'” = cos(a) + isin(a). (21.1) 


We may view this formula as a grand unification of the natural exponential 
function and the trigonometric functions by way of the field of complex num- 
bers. The basic identities satisfied by the natural exponential function e* are 
much simpler than our trig identities, which is why we prefer to work with 
exponentials. We can recover cosines from exponentials using the following 
consequence of Formula 21.1: 


ele + ete 
2 
0 


So it makes sense to study the complex number e’’. 


cos(@) = (21.2) 


Notation 21.10. For an integer n > 1, we let ¢, = e27/” and we call Cn a 
primitive n™ root of unity. 
The name is justified by the following result. 


Lemma 21.11. For any integer n > 1, the number ¢, is a root of the 
polynomial x” —1 € Qa], but is not a root of x” —1 for any integer 
m € {1,2,...,2—1}. In particular, ¢, is algebraic over Q. 


Proof. We have ¢% = (e27#/")" = e?7! = 1, Let m € N—{0}, and suppose that 
cm — 1, Then 1 = CG = e?™7/n — cos(2mn/n) + isin(2mm/n). Therefore 
cos(2m7/n) = 1, so 2m/n is an even integer. This forces n to divide m in Z, 
and establishes the result. 


Constructible Regular Polygons 251 


We can restate the previous result in the language of group theory: 


Corollary 21.12. For an integer n > 1, the complex number ¢, has order n 
as an element of C*. 


The complex numbers ¢,,,¢7,...,¢" lie on the unit circle with center 0 in 
the complex plane, and are themselves the vertices of a regular n-gon. Thus, 
they cut the unit circle into n equal pieces. The prefix cyclo comes from the 
Greek word for circle (as in cycle or bicycle), and the suffix tomic comes from 
the Greek word for cut (as in atomic—cannot be cut, or appendectomy—a 
procedure to cut out an appendix). 


Definition 21.13. For an integer n > 1, we call Q[¢,] the n” cyclotomic 
field. We define ®,(x) := Inr(¢n,Q,x) and call ©,(x) the n cycolotomic 
polynomial. 


Now it is easy to find all the roots of a” — 1: 


Lemma 21.14. The compler numbers 1 = €9,¢n,¢2,...,¢" + are all the 
distinct roots of the polynomial x” —1 in C. 


Proof. For any j € Z, we have (¢3)" =¢™ = (¢")J = 19 = 1, so CG is a root of 
x” —1. The group (¢,,) generated by ¢, in C* has order n, by Corollary 21.12, 
and so €°,..., ¢"~+ are distinct, from Lemma 5.6. A non-zero polynomial over 
a field cannot have more roots than its degree, by Theorem 14.25. 


Corollary 21.15. For any integer n > 1, the polynomial x” —1 € Q{2] factors 


as 
n-1 


x” —-1=|[(x-¢). (21.3) 


j=0 


Proof. The linear polynomials x — ¢J are irreducible in C[z] by Exercise 14.6, 
hence prime by Corollary 20.28 and Theorem 20.26. Each one divides x” — 1 
by the Root-Factor Theorem. No two of these polynomials are associates of 
each other for 0 < 7 < n—1, so by unique factorization in C[a], the entire 
product on the right-hand side of Equation 21.3 divides x” — 1. The quotient 
has degree 0 and is monic, so must be 1. 

Alternatively, both sides of Equation 21.3 are zero at ¢} when 0 < j < n-1, 
and are monic of degree n; so their difference is a polynomial of degree at most 
n —1 with at least n distinct roots in C. Hence this difference must be 0 by 
Theorem 14.25. 


What is the relationship of the cyclotomic field Q[¢,] to the field 
Q|cos(27/n)]? The reader is encouraged to play with Equation 21.2 and in- 
vestigate this question before reading on. 


Proposition 21.16. For any integer n > 3, we have Q[cos(@)| < Q[¢,] and 
[Q[¢n] : Q[cos()]] = 2, where 0 := 27/n. 


252 The Problems of the Ancients 
Proof. Set F = Q{cos(6)] and K = Q[¢,]. By Equation 21.2, we have 
cos(9) = (Gn + Gn ')/2 € K, (21.4) 


so F < K. Since F < R but ¢, ¢ R (since n > 3), we have ¢, ¢ F, so 
K # F. Therefore [K : F] > 1, by Exercise 15.5. On the other hand, we can 
multiply Equation 21.4 by 2¢, to see that ¢, is a root of the polynomial f := 
x? — 2cos(@)z + 1 € F[2], so deg(Irr(cos(@), F,x)) < 2. Therefore [K : F] < 2 
by Lemma 15.25. 


The extraordinary fact that the roots of x” — 1 are all powers of ¢,, leads 
to a particularly nice set of properties of the cyclotomic field Q[¢,]. 


Proposition 21.17. Letn € N, n> 1. The cyclotomic field Q[¢,] is a finite 
Galois extension of Q, and the Galois group Gy, := Gal(Q[¢,|/Q) is abelian. 
Every element o € G, sends ¢, to C1” for a unique m € {0,1,2,...,n—1}; and 
for this m, we have gcd(m,n) = 1. This gives an embedding Gn (Z/nZ)*, 
ar m+nZ. 


Proof. Set K = Q[¢,]. By Proposition 16.23, K/Q is a separable extension. 
By Theorem 15.21 together with Corollary 21.15, L := Q[¢°,¢1,...,¢" +] is 
the splitting field of x” —1 in C. So L is normal over Q by Proposition 16.25. 
Now ¢, € L, so K < L; but also (J € K for all j € N, so L < K. Therefore, 
L=K. So K is Galois over Q. Next, let o € Gy, and set H = (¢,) < C%. Let 
7 =o|yH. Then7: H - C™% is a group homomorphism, e.g. by Exercise 20.4. 
By Proposition 7.4, T(H) consists of n‘® roots of 1, hence t(H) C H. Since 
a is injective, so is 7; as H is finite, this forces r(H) = H. So 7 € Aut(#). 
But H is cyclic of order n, hence H = Z, := (Z/nZ,+). Thus Aut(H) = 
Aut(Z,) = (Z/nZ)* by Exercise 12.18. So we get a group homomorphism 
w : Gy, > (Z/nZ)*. Tracing through the details of the various maps in 
question shows that w(a) = m+ nZ where o(¢,) = G7”, as required. Finally, 
if w(o) =1+nZ, then o(¢,) = Cn, and so o = idx. Thus, w is an embedding 
by Lemma 9.18. 


We can now reduce the problem of the constructibility of regular polygons 
to a fairly basic question about cyclotomic fields. 


Proposition 21.18. For any integer n > 3, the regular n-gon is constructible 
if and only if [Q[¢,] : Q] = 2" for somer EN. 


Proof. (=) : Suppose that the regular n-gon is constructible. This means 
by definition that the number cos(27/n) is constructible. So [Q[cos(27/n) : 
Q] = 2" for some k € N, by Lemma 21.4. Therefore [Q[¢,] : Q] = 2**1, by 
Proposition 21.16. 

(<=) : Let K := Q{¢,], and suppose that [kK : Q] = 2" with r EN. 
Let F = Q[cos(27/n)], and let G = Gal(K/Q). From Proposition 21.16, we 
see that [F : Q] = 2’~!. By the Fundamental Theorem of Galois Theory 
(FTGT), we have F = K*” for some H < G. By Proposition 21.17, G is 


Constructible Regular Polygons 253 


abelian. So H < G (by Exercise 11.4), and F is Galois over Q by FTGT. Let 
A = Gal(F/Q). Then |A| = [F' : Q] by FTGT. So A is a finite 2-group. By 
Exercise 19.13, there is a tower of subgroups {e} = Ag < Ar <---< Ap_1 =A 
with [A; : A;-1] = 2 for all i € {1,2,...,r —1}. By FTGT, this tower of 
subgroups corresponds to a tower of fields Q = F,_1 <---< Fi < Fo =F 
with [F; : Fizi1] = 2 for all i € {0,1,...,r — 2}. Therefore, cos(27/n) is 
constructible. 


By Proposition 21.18, the problem of the constructibility of regular n-gons 
has been reduced to finding the degree of the irreducible polynomial of ¢,, over 
Q, i.e, the degree of ®,,(x). Since the polynomial «” — 1 is in the kernel of 
evaluation at ¢,, we must have ©, | (2” — 1) in Q[z]. So let us try our hand 
at factoring «” — 1 for some small values of n. We have 


gi—-l=a-1; (21.5) 
gz? —1=(¢—1)(2+1); (21.6) 
g=1S>(¢=-1)(27 +2+1). (21.7) 


But how can we be sure that this last factor, f := 2? +a -+1, is irreducible in 
Q|z]? Since f is just quadratic, we can “brute-force” the question by using the 
quadratic formula, or we could use Exercise 14.7 together with the Rational 
Root Theorem (see Exercise 21.4). But for a result that generalizes, we prefer 
to use Eisenstein’s Criterion (EC). Since EC does not apply to f directly, 
we must be clever. Let y = 2+ 1, and consider g := f(y) = yy+yt1 
= (#+1)?+(@+1)4+1 = 2? +32 +3. Now g is irreducible in Z[z] by 
Eisenstein’s Criterion, but how does that help with f? The key here is the 
realization that the function 


Q : Za] > Za], h(x) h(a t+ 1) 


is an automorphism of Z[a]. This allows us to say that f is irreducible in Z[a] 
iff g = Q(f) is. (Compare the argument made in the proof of Theorem 21.8.) 
Thus, Equation 21.7 gives a factorization of x? — 1 into irreducibles in Z[z]. 

The same ideas can be used to factor the related polynomial x? — 1. Every 
cube root of 1 is also a ninth root of 1, so it makes sense that x?—1 divides 2°—1 
in Q[z]. In fact, we have (2° —1)/(a3-1) = 2 +a3+1 = (#3)?+a23+1 =: h(a). 
Again it is convenient to set y = 2+ 1. We find h(y) = (y3)? +y241 
= ((v@+1)?)? + (a@ +1)? +1 = (2? +1)? + (22 +1)4+1 (mod 3) = 2° (mod 3). 
The constant term of h(y) is just 3, so h is irreducible by EC. We make these 
ideas precise using the following definition and results. 


Definition 21.19. Let R be a commutative ring with 1, and let 2 be an ideal 
of R. Let b,c € R. Then we write b= c (mod 2) to mean c — b € 2. We read 
the former expression as “b is congruent to c modulo the ideal 2.” 


254 The Problems of the Ancients 


Lemma 21.20. Let R be a commutative ring with 1, and let p be a prime in 
Z. Let p = pR, the ideal of R generated by the image of p under the natural 
map x : ZR (see Lemma 16.20). Then for all a,b € R, we have 


(a+b)? =a? +b? (mod p). (21.8) 
More generally, if a1, ..., ar € R, then we have 
(a, +---+a,)? =ai+---+a? (mod p). (21.9) 


Proof. By the Binomial Theorem in R (Exercise 16.19), we have (a+ b)? = 

r—o (2) a*bP-*. If 1 < k < p—1, then certainly p J k! and p J (p — k)!, 
but p | p!, so p | (2). This establishes Equation (21.8). Equation (21.9) follows 
immediately by induction on r. 


The definition of the characteristic of a ring makes the following result 
immediate: 


Corollary 21.21. Let p be a prime in Z, and let R be a commutative ring 
with 1 such that char(R) = p. Then the mapt: R > R, a a? is a ring 
homomorphism. In particular, for allay, ..., a» € R, we have (a,+:--:+a,)? = 

P P 
a, +-:++a,. 


Proposition 21.22. Let gq =p’, where p is a positive prime integer and r is 
a positive integer. Then 


xP — 1 
®,(2) = Sa. (21.10) 
Proof. Let f(a) = («?” —1)/(a?"’ — 1) € Q(z). Set z = 2?” ", so that 
f= (2 -1)/(2-) =2? 142 74+ + 241€ Za). (21.11) 


Let Q : Q[a] > Q{z] be the function given by g(x) 4 g(a +1) for g € Q[z], 
and let p = p- Z[xz]. Note that p is irreducible in Z[x], which is a UFD by 
Proposition 20.45, so p is prime in Z[a] by Theorem 20.26, hence p is a prime 
ideal by Exercise 12.12. We have 0(f) = ((a+ 1)P”*)p-1 +((a+ 1)?" *)p-2 + 
et (etl 41S (2 $1) + (ze $ 1)P-2 + + (z +1) +1 (mod p) 
by Lemma 21.20. So Q(f) = ((2 + 1)? — 1)/((2 + 1) — 1) (mod p). Now set 
f:=((24+1)?-1)/e=2/z=27) ap @-1) (mod p). Then Q(f) = f = 
az?” ‘(P-1) (mod p), and the constant term of 2(f) is (Q(f))(0) = f(1) =p 
(by Equation 21.11). So f is irreducible in Z[a] by Eisenstein’s Criterion, hence 
irreducible in Q[z] by Corollary 20.44. Since 0 € Aut(Q[z]) (e.g. by Exercise 
21.5), then f is also irreducible in Q[x] by Exercise 21.6. Let g = 2?” * — 1. 
We have 


a? —1= fg. (21.12) 
Let 6, = e?7*/4, as usual. Evaluating both sides of Equation (21.12) at 2 = CG, 


gives 0 = f(¢q) - 9(¢q). Now g(¢,) # 0 by Lemma 21.11, so f(¢,) = 0. Since 
f is monic and irreducible in Q[z], it follows from Exercise 15.2 that f = 


Irr(¢g, Q, x) =: @y(z). 


Constructible Regular Polygons 255 


Corollary 21.23. Let q be a prime power. Then deg(®,(x)) = o(q), and we 
have Gal(Q[¢q]/Q) = (Z/qZ)* via the embedding of Proposition 21.17. 


Proof. Write q = p’ where p is prime in Z and p,r € N. We have 


|Gal(Q[¢q]/Q)| = [Q[¢q] : Q] (by FTGT, Theorem 16.32) 
= deg(Irr(¢,, Q, z)) (by Lemma 15.25) 
= deg(®,(z)) (by definition of ®,) 
=p"—p"* (by Proposition 21.22) 
= o(p") (by Exercise 17.7) 
= |(Z/qZ)*| (by definition of (q)). 


Since this quantity is finite, the embedding must be a bijection. 


It turns out that Corollary 21.23 is true for any positive integer n, not just 
for prime powers q. (See Exercise 21.15 for a proof of this statement which 
relies on a well-known theorem from number theory.) However, even without 
this more general result we already know enough about the structure of cyclo- 
tomic fields to settle the question of which regular polygons are constructible. 


Corollary 21.24. Let n be a positive integer. Suppose that n = [J;_, ps’ 
is the prime factorization of n. Then lem({o(p;') : 1 < i < r}) divides 
deg(®,,()), which in turn divides o(n). 


Proof. Pick a prime p = p; dividing n, and let g = p;', m = n/q. Then 
we have ¢, = ¢7", from which we see that Q[¢,] < Q[¢,]. So [Q[¢,] : Ql] 
divides [Q[¢,] : Ql] (by Lemma 15.26 and Exercise 15.9). But [Q[¢,] 

Q] = ¢(q) (see the proof of Corollary 21.23), while [Q[¢,] : Q] = deg(®,). 
This establishes the first divisibility statement. For the second, note that the 
order of Gal(Q[¢,,]/Q) divides the order of (Z/nZ)* by Proposition 21.17 and 
Lagrange’s Theorem. 


Theorem 21.25. Let n be a positive integer, n > 3. Then a regular n-gon is 
constructible by straightedge and compass iff ¢(n) is a power of 2. This occurs 
iff the prime factorization of n has the form n = 2° -[J;_, pi, wherea € N 
and the p; are distinct odd primes such that each p; — 1 is a power of 2. 


Proof. From Corollary 21.24 and Exercise 17.7, we see that a prime divides 
deg(®,,) iff it divides @(n). Since deg(®,,) = [Q[¢,] : Ql], the first part of the 
result now follows from Proposition 21.18. For the second part, we can always 
write n = 2°-]]j_, ps’, where a,e; € N, e; > 1, and the p; are distinct odd 
primes. Then ¢(n) = 27! -[]/_, pf *(pi — 1) (by Exercise 17.7). The desired 
conclusion now follows by inspection. 


Remark 21.26. An odd prime p € N such that p—1 is a power of 2 is known as 
a Fermat prime. The only known Fermat primes are 3, 5, 17, 257, and 65537. 


256 The Problems of the Ancients 


Example 21.27. The smallest Fermat prime is 3, so Theorem 21.25 guarantees 
that a regular 3-gon is constructible. Indeed, we have cos(27/3) = —1/2 € Q. 
Example 21.28. The second-smallest Fermat prime is 5. To construct a := 
cos(27/5), we start with the tower of fields Q < Q[a] =: F < Q[¢5] =: K. 
The index between the last two fields in this tower is 2 (by Proposition 21.16), 
while the overall index of Q in K is $(5) = 4 (by Corollary 21.23). It follows 
that [F : Q] = 2. We have Irr(¢5, Q, xz) = (2° —-1)/(a-1) = *+a3+2?+241 
(by Proposition 21.22), so the fundamental relation satsfied by ¢; over Q is 

S+E+G+6+1=0. (21.18) 
Of course, we also have ¢? = 1 (e.g. by Lemma 21.11); so (2 = ¢2 whenever 
a = b (mod 5) (by Exercise 5.7). 

Again by Corollary 21.23, we have Gal(K/Q) := G = (Z/5Z)*. Now 
(Z/5Z) * is cyclic, generated by 2 (and by 3, but let’s use 2). The isomorphism 
of Corollary 21.23 therefore gives G = (o), where o(¢5) = ¢? and, in general, 
or eam ej(2) = Sa c)¢5! for c; € Q and k € N. By the Fundamental 
Theorem of Galois Theory (FTGT), the field F is the fixed field of a subgroup 
H < G of order 2. There is only one such subgroup, namely H := (a7). Again 
by FTGT, we have L := Gal(F'/Q) © G/H, via the map w +> w|pr. The 
set of conjugates of a over Q in C is therefore C := {r(a) : 7 € G} 
= {a,0(a)}. We have a = (¢5 + ¢5*)/2, while o(a) = (C2 + ¢§7)/2. So 
C= {(65 +65") /2, (G5 + G5 °)/2}. We calculate 


Irr(a, Q, x) = II (a — 7) (by Exercise 19.10) 
yEC 


= (x — a)(x — o(a)) 

= (4 — (Gs + G5 *)/2)(a — (G3 + 65 7)/2) 

=2 - (646° 4+6+67)a/2+(@+6°+6+6°)/4 
=27—(G+G+6+)2/2+ (B+G+t+G)/4 

= a — (-1)a/2+(-1)/4 (by Equation 21.13) 
=a? +2/2-1/4. 


Therefore, a? + a/2 — 1/4 = 0, so a = (—1/2 + \/1/4—4-1- (—1/4))/2 
= (-1/2+ ./5/4)/2 = -1/4+ /5/4. Since a > 0, we have a = (5 — 1)/4. 
(See Figure 21.4.) 

Example 21.29. Let us step through the details of constructing a regular 15- 
gon by straightedge and compass. More accurately speaking, we will show how 
to write cos(27/15) using only rational numbers, arithmetic operations, and 
square roots. Since 15 = 3-5, Theorem 21.25 assures us that this is possible. 
However, in this situation we are somewhat at a loss as to the Galois group 
G := Gal(K/Q), with K := Q[¢15]. We know that G — (Z/15Z)*, but we are 
unwilling to use the result of Exercise 21.15 giving G © (Z/15Z)*, since this 


Constructible Regular Polygons 257 


FIGURE 21.4: A regular 5-gon, aka pentagon 


relies on a theorem of Dirichlet which is beyond the scope of this text. Our 
strategy (as so often in mathematics) is to find a procedure which “should” 
work if in fact our hunch is correct that this embedding is an isomorphism. 

So set T := (Z/15Z)*, and write ¢ for ¢15. We suspect that |G| = |T'| = 
8 and thus that every element of K has a unique expression of the form 
ee cj¢/ with c; € Q (by Lemma 15.25). From Exercise 17.7, we have T = 
(Z/3Z)* ® (Z/5Z)* via the map w: a+ 15Z +> (a+ 3Z,a+5Z). Wouldn’t 
it be nice if we could represent an element of K using ¢3 and C5 instead of ¢, 
and make this representation mesh with the action of the direct summands 
of [? Well, we can! On the one hand, we have ¢3 = ¢° and ¢; = ¢3, so 
Q{és, 3] < Q[¢]; while on the other hand, we have ¢ = ¢? - ee € Q[és, C3]. 
So Q[¢] = Qlés,¢3]. Set L := Q[¢s]. We know from Corollary 21.23 that 
[L : Q] = 4, and thus L = aan cj¢2 : c; € Q}. Now K = L[¢3], and ¢3 is 
a root of Irr(¢3,Q, x) = (2? — 1)/(a — 1) = 2? + 4+1. So deg(g) < 2, where 
g := Irr(¢3, L, x). It follows that either K = L or else every element of K has 
a unique expression of the form ¢; + £2¢3 with ¢),02 € L (again by Lemma 
15.25). In either case, we have kK = {€; + €2¢3 : €,,€2 € L} (although in the 
former case, this expression is not unique). To summarize, we have 


K = {egters+co6? +0363 + (do tdiest+do62+d3¢3)C3 : ej,de € Q}. (21.14) 


Let us verify that the action of G on K is compatible with our factored point 
of view. If o € G is an automorphism such that o(¢) = ¢*, then we have 
o(¢3) = o(¢°) = C°* = C and o(¢s) = o(¢3) = C34 = CY. Conversely, if o € G 
is such that o(¢3) = ¢4 and o(¢s) = ¢#, then o(¢) = ¢", where r is the unique 
integer modulo 15 such that r = t (mod 3) and r = wu (mod 5). Thus, an 
element (t+ 3Z,u+5Z) €T, when viewed as an element of G, has the action 
63 > Cs. 65 b> Cs. Beautiful! 

Now we are ready to calculate. Set a = cos(27/15) = (C+¢~1)/2. We first 
write a in the form indicated in Equation 21.14. We have a = (¢ + ¢~')/2 


( 
= (C265* + 5 7¢3)/2 = (C23 + CB¢3)/2 = (C2(-1 — Cs) + CB C3) /2 = -3G2 + 


258 The Problems of the Ancients 


(—$¢2 + $¢3)¢3. Now if in fact |G| = |I'| = 8, then the conjugates of a over 
L should be a itself and o(a), where o is the automorphism of kK over Q 
which fixes ¢5 but not ¢3: that is, o would send ¢3 to ¢? and ¢5 to ¢5. We 
would have o(a) = —5¢8 + (—9¢3 + 903)G3 = — 568 + (563 — 908)C3. But if 
|| < 8, then such a map o is not well-defined; nevertheless, no one can stop 
us from setting 6 = -i3 + (562 - $3)C3 and computing what should be the 
irreducible polynomial of a over L, namely 


f := («- a)(x — B) 
= 2* —(a+ B)x + (af) 
= (C3 + G3)x/2 + ( 
= 2" + (G+ ¢3)a/2 + (-¢ 
(C3 + G3)2/2 + (-2 


(—C3 + 2¢2 — C8)¢3 + (—C3 + 208 — €8)¢3)/4 
ae ¢3)/4 
— (8)/4. 


Set y := ¢2?+¢3. Then 7 € L, and 7 is fixed by complex conjugation ¢; 4 G. 
so the conjugates of y over Q are y and r(y), where 7(¢5) = ¢?. Now r(y Os 
(A408 = -1-C2-@8. We compute (2—7)(«—7()) = 2? +2 +(-2-G- 
(2—@2—¢8) = 2?+a—-1. Therefore y = (-14+/5)/2. Glancing at the ae 
of ¢? and C3 in the complex plane shows (see Figure 21.5) that ¢? + ¢3 < 0. 
Therefore, y = (—1— /5)/2. So we have f = x? — (14+ /5)2/44+(—3+4 V5)/8. 


C+ 
fe 
5 


7iR 
Gs 
G 
: R 
G 
3 


FIGURE 21.5: the fifth roots of 1 in C 


Since a is a root of f, we use the quadratic formula to find a. The discriminant 
of f is (1+ V5)/4)? —4-1- (-3 + V5)/8 = (6 + 2V5)/16 — (-3 + V5)/2 
= (15 - 3/5) /8. Soa = (4 vaya (15 — 3v5)/8) /2= (14+ V5 + 
1/30 — 6\/5)/8. Now 1+ V5 < 730 — 65 iff 6425 < 30—6v5 iff 8/5 < 24 


iff /5 <3, which is true. Therefore, as a = cos(27/15) > 0, we have 


1+ 75+ V30-675 
: . 


cos(27/15) = 


(21.15) 


Exercises 259 


Remark 21.30. In fact, it is not hard to show that [K : Q] = 8 in Example 
21.29. For suppose not; then [K : Q] = 4, K = L = Q[¢5], and Gal(K/Q) = 
(Z/5Z)* = Z4. We would have (—1+ iV3)/2 = ¢3 € K. But Z, has a unique 
subgroup of order 2, corresponding by Galois theory to a unique quadratic 
extension of Q in K; yet both Q[V5] and Q[iV/3] are subfields of K which 
are quadratic over Q, and these fields are clearly not equal, since one is in R 
but the other is not. We purposely avoided this argument in that example in 
order to show how one can make do with only partial information. 


Remark 21.31. The reader is encouraged to work on Example 21.29 using the 
unfactored group (Z/15Z)* and the cyclotomic polynomial ®15 instead of the 
factored form (Z/3Z)* @ (Z/5Z)* and the polynomials ®3 and ®s5, if only to 
convince yourself that the factored approach is easier. 


21.4 Exercises 


Exercise 21.1. Notice that in Operations (1), (2), and (3), we implicitly as- 
sumed that our compass can be extended to an arbitrary distance—that our 
string is infinitely long. Would the set € of constructible numbers change if we 
assumed instead that the string had a finite length @? If so, would € depend 
on £? 


Exercise 21.2. Prove that if a is constructible and a 4 0, then 1/a is also 
constructible. This finishes the proof of part (II) of Claim 21.2. 


Exercise 21.3. Prove part (IV) of Claim 21.2. Suggestion: Impose Cartesian 
coordinates so that the initial line segment has endpoints at (0,0) and (1,0). 
Induct on the total number of operations performed, including multiplicity, 
and at each step, establish that the coordinates of every identified point are 
themselves constructible numbers. 


Exercise 21.4. Let R be a UFD, and let f(x) = 0}, aix’ € Riz] with a; e R 
and a, # 0. Let K be the field of fractions of R. Prove that if a € K and 
f(a) = 0, then we can write a = a/b where a,b € R, b #0, and a| ao, b| ay 
in R. (The special case of this result when R = Z is known as the Rational 
Root Theorem.) Hint: Exercise 20.20 may be useful. 


Exercise 21.5. Let R be a commutative ring with 1, and let S = Riz]. Let 
hes. 

(a) Prove that there is a unique ring homomorphism w : S > S over R 
such that w(x) = h. (It is natural to denote w(f) by f oh or f(h) for f € S; 
see Exercise 14.15.) 

(b) Suppose now that R is a field. Prove that w € Aut(S) iff deg(h) = 1. 


Exercise 21.6. Let R be a domain and let o € Aut(R). Let f € R. Prove that 
f is irreducible in R iff o(f) is irreducible in R. 


260 The Problems of the Ancients 


Exercise 21.7. Use the identities cos(a + b) = cos(a) cos(b) — sin(a) sin(b) and 
sin(a + b) = sin(a) cos(b) + cos(a) sin(b) to prove the “triple-angle formula” 
cos(3¢) = 4 cos3(¢) — 3cos(¢). 
Exercise 21.8. Let n be an integer with n > 3. Set F := Q[cos(27/n)] and 
K := Q[G,]. Let a : C > C be the complex conjugation map, a(a+bi) = a—bi 
for a,b € R. Let 7 be the restriction of o to K. Let G = Gal(K/F). 

(a) Prove that 7 € G. 

(b) Let T = (7) < G. Show that |T| = 2. 

(c) Prove that F = KT? is the fixed field of T, and thus that we have 
F=KOR. 


Exercise 21.9. (a) Let a,b € Z, and let g = gcd(a,b). Prove that the ideal 
(a,b) is equal to the ideal (g) in Z. Conclude that there exist c,d € Z such 
that ca + db = g. (Compare Exercise 4.7.) 

(b) Let n be a positive integer, and suppose that n factors as n = ab, with 
a,b € N and gcd(a, b) = 1. Prove that Q[¢n] = Q[¢a, GJ. 

(c) Prove that ifn = j= py where p1,..., pr are distinct positive prime 
integers and e; € N, then Q[¢,] = Q[¢q,,---,¢q,], where gj := ee 


Exercise 21.10. Find an explicit formula for cos(2a/n) for the following values 
of n. In each case, describe a tower of fields starting with Q and ending 
with Q[cos(27/n)] such that each successive extension has degree 2 over the 
previous field. 

(a) n = 24, 

(b) n= 17. 
Exercise 21.11. How many nested levels of square roots would be needed 
in order to write cos(27/65537) explicitly using only rational numbers, field 
operations, and square roots? 


Exercise 21.12. Let n be a positive integer, and suppose that p is an n*® root 
of 1 in C (not necessarily primitive); that is, p” = 1. Let K = Q[p]. Let m be 
the order of p in K™. 

(a) Prove that K = Q[¢m], the m*® cyclotomic field. 

(b) Prove that ifm > 2, then K #4 Q and [Kk : Q] is even. 


Exercise 21.13. Let K be a finite Galois extension field of F’, and let G = 
Gal(K/F). Define a function VN : K — K by the formula V(a@) = [[,¢¢ 4(@) 
for a € Kk. (Compare Exercise 20.25.) 

(a) Prove that N(K) C F. Note: we also write Vx; for N, and describe 
N as the “norm map from K down to F.” 

(b) Prove that Va, 8 € K, N(aB) = N(a)-N(8). 

(c) Suppose that F < L < K and L is Galois over F. Prove that Nixjp = 
Ni /F° N, K/L: 
Exercise 21.14. (a) Let R be a UFD, and let K = k(R) be the field of fractions 
of R. Let L be a finite extension field of K. Suppose that a € L and a 
is integral over R (see Exercise 20.26). Prove that Irr(a, F,x) € R[x]. Hint: 
Corollary 20.43 may be useful. 


Exercises 261 


(b) Use part (a) above to conclude that ®,, € Z[a] for all positive integers 
n. 

(c) Let n be a positive integer and let d = deg(®,,). Use part (b) above to 
prove that Z[¢n] = {ao +aign +-+-+ ag—1¢4-! : a; € Z}. Also note that this 
representation of an element of Z[¢,,] is unique. 


Exercise 21.15. In this exercise, you will make use of the following result to 
find the cyclotomic polynomial ©®,, for any positive integer n. 


Theorem 21.32 (Dirichlet’s Theorem on Primes in Arithmetic Progressions). 
Let m,n be two positive integers such that gcd(m,n) = 1. Then there are 
infinitely many primes p such that p=m (mod n). 


Let n be a positive integer, and suppose that }7_) ¢jG) = 0 with re N 
and c; € Z. Let m € {1,2,...,n} be such that ged(m,n) = 1. 

(a) Suppose that q is a prime such that g = m (mod n). Show that we 
have ea cj” € q- Z|¢,]. Hint: take the g'® power of the original relation. 

(b) Use Dirichlet’s Theorem (Theorem 21.32) to prove that ae, GG = 
0. Hint: Exercise 21.14 may be helpful. 

(c) Prove that ¢7” is a root of ®,. Deduce that deg(®,,) > ¢(n). 

(d) Use Corollary 21.24 to conclude that deg(®,,) = @(m) and that 


@,(z7)= [J (@-¢). 


l<a<n 
gcd(a,n)=1 


Observe that this allows us to generalize Corollary 21.23 from q to n. For a 
proof of Dirichlet’s Theorem (using techniques from analysis), see [1]. For a 
self-contained algebraic proof of the result of part (d) of this exercise, see e.g. 
[5, Chapter 33]. 

Exercise 21.16. This exercise gives a practical formula to compute ®(n) that 


does not involve complex roots of 1. 
(a) Let n be a positive integer. Use Exercise 21.15 to prove that 


G(x) = (2@”-1)/ [J (2). (21.16) 
k|n 
ie 4 
(b) Use Equation 21.16 together with Proposition 21.22 to find explicit 
formulas for ®,, when n < 6. 


Exercise 21.17. Let n be a positive integer. Prove that n = S- o(f). 
1isf<n 
f\n 
Exercise 21.18 (A Finite Multiplicative Subgroup of a Field is Cyclic). Let F 
be a field, and let G be a finite subgroup of F*. Set n = |G|. 
(a) Prove that if H < G and |H| = f, then every element of H is a root 
of the polynomial xf — 1 € F{a}. 


262 The Problems of the Ancients 


(b) Deduce that for every factor f of n, there is at most one subgroup of 
G of order f. 

(c) Use Exercise 18.3 and part (b) above to prove that for every factor f 
of n, there are at most $(f) elements of G of order f. 

(d) Prove that for every factor f of n, there are exactly ¢(f) elements of 
G of order f. Taking f = n, proceed to deduce that G is cyclic. Hint: every 
element of G has some order f dividing n; use the result of Exercise 21.17. 


Exercise 21.19. Let F' be a field with char(F’) = p > 0. Let n be a positive 
integer, and let p® be the highest power of p which divides n. Set m := n/p® 
and f := a" —1€ F{a]. Let K be a splitting field for f over F. 

(a) Let g := 2 —1 € F|a]. Prove that K is also a splitting field for g over 
F. Hint: use Corollary 21.21. 

(b) Prove that K is a Galois extension of F’. (You may find Exercise 16.13 
useful.) 


Exercise 21.20. Let F be a field. Let n be a positive integer, and let kK be 
a splitting field over F for the polynomial x” — 1 € F [a]. (We may think of 
K/F as a generalization of a cyclotomic field extension.) 

(a) Let R be the set of all roots of 2” — 1 in K. Prove that R is a finite 
subgroup of K™. 

(b) Use Exercise 21.18 to deduce that R = (¢) for some ¢ € kK”. 

(c) Show that we have K = Fé). 

(d) Set r = |R|. Prove that K is a finite Galois extension field of F’, and that 
G := Gal(K/F) © (Z/rZ)* (you can mostly imitate the proof of Proposition 
21.17, but use Exercise 21.19 to help in case char(F’) > 0). Conclude that G 
is abelian. 

(ce) Suppose that w is an n‘* root of 1 in some extension field of F. Deduce 
that L := Fw] is a Galois extension of F’, and Gal(L/F) is abelian. Hint: 
Embed LF in a splitting field of 7” — 1 over F. 


Exercise 21.21. This exercise continues Exercise 16.24. Let K be a finite field, 
and set p = char(K). Let F' be the prime subfield of K. 

(a) Definea mapa : K > K by the formula a+> a?. Use Corollary 21.21 
to help prove that o € Gal(K/F). 

(b) Prove that Gal(K/F’) = (o). 

Exercise 21.22 (Fundamental Theory of Finite Fields). This exercise continues 
Exercise 21.21. Let F = Z/pZ, where p is prime. Let K be a splitting field for 
f := x?" — x over F, and let G := Gal(K/F). 

(a) Show that |AK| > p", hence |G| > r. (Hint: does f have any repeated 
roots?) 

(b) Let o be the p**-power map of Exercise 21.21, which generates G. Set 
T=0" €G. Let a€ K bea root of f. Prove that T(a@) = a, and use this to 
show that + = idx. Conclude that |G| <r. 

(c) Prove that a finite field of order n exists if and only if n is a prime 
power, and that any two finite fields of the same order are isomorphic. 


22 


Solvability of Polynomial Equations by 
Radicals 


22.1 Radicals 


In this chapter, we complete the original program of Evariste Galois, which 

aims to understand the solution of polynomial equations in a single variable 

x by means of formulas involving a combination of ordinary arithmetic and 

roots—square, cube, or higher. We will take our coefficients from a field F’. 
We start at the beginning. A linear equation 


ax +b=0, (22.1) 
to be truly linear, must have a 4 0; then the unique solution in the field F' is 
b 
=—-. 22.2 
naa (22.2) 
A quadratic equation 
az* + be +c=0 (22.3) 


has up to 2 distinct solutions, which are given by the quadratic formula 


—b+ Vb? — 4ac 
L = ——__—. 
2a 


Already, some issues appear here. First, we have no definition for a “square 
root” of an element in a general field F’. In fact, the only situation when a 
square root has an unambiguous interpretation for us occurs when we take the 
square root of a non-negative real number, which by convention is another non- 
negative real number. Second, while we do know how to interpret an integer 
multiplied by a field element (for instance, 2a means a+ a), we are in trouble 
in case char(F’) = 2, for then the denominator is 0. The latter problem is easy 
to side-step: we can take F' to have characteristic different from 2. 

The first problem can be smoothed over in a natural way: we can say that 
a square root (not “the” square root!) of an element a € F is an element p of 
some extension field K of F with the property that p? = a. An n* root can 
be defined similarly for any positive integer n. As the reader may be aware, 
the term “radical” is a synonym for “root.” This leads us to the following 
definitions. 


(22.4) 


DOT: 10.1201/9781003252139-22 263 


264 Solvability of Polynomial Equations by Radicals 


Definition 22.1. Let F' be a field. An element p of an extension field of F is 
called a (simple) radical over F if p” € F for some n € ZT. 


The next definition describes the kind of elements that can be found by 
nesting roots inside each other any finite number of times and also applying 
field operations. 


Definition 22.2. Let F be a field. An element a of an extension field of F' is 
called a radical expression over F if there is a tower of fields F' = Fo < Fy < 
--. < F, such that a € F,, and for all i € {1,2,...,r}, we have F; = Fj_1[p;] 
for some p; which is a simple radical over F;_ 1. 


Example 22.3. Working in R, it is easy to write down radical expressions over 


Q. Two examples are a := 2+ V9 and B := 4/ i =A 24 4/2, 


Example 22.4. Let a € R bea constructible number. By definition, this means 
that there exists a tower of fields Q = Fo < Fi <--- < F, < R such that a € 
F,, and [F; : Fj_1] = 2 for alli € {1,...,n}. It follows that F; = F;_4[p,;] for 
some p; which is a root of a quadratic polynomial f; = ajx?+bjr+c; € Fy_1 [a]. 
By choosing p; instead to be a square root of the discriminant D; = b? — 4a;c; 
of f;, we do not change the fields F;; so we see that a is a radical expression 
over Q. As the reader may have surmised by now, a constructible number is 
the same thing as a radical expression over Q in which all of the radical signs 
are just square roots (and everything is in R). 


Our parenthetical note at the end of Example 22.4 is of more than passing 
interest: it will be very convenient to consider radicals in some “big” field which 
contains everything in the picture, such as R in that example. To see what 
difficulties we wish to avoid, suppose that F is a field, a is a radical expression 
over F’, and £ is a simple radical over Fa]. We certainly want to be able to 
prove that ( is then a radical expression over F’. Now we know there is a tower 
of fields F = Fo < Fi <--- < F, such that a € F, and F; = F;_1[p;], where 
each p; is a simple radical over F;_;. The natural argument is to consider the 
tower 

F=h)<F,<::-<F,<F,{9). (22.5) 


Unfortunately, the top step in this tower does not make sense. The trouble 
is that, since 6 and F;, do not belong to a common field, there is no way to 
adjoin 6 to F,.; we do not know how to perform addition, say, with 6 and 
an arbitrary element of F,. Instead, as it is, we can only adjoin 6 to Fal. 
With a common superfield Q, the situation is remedied. The two situations 
are illustrated in Figure 22.1. Fortunately, we understand fields well enough 
to fix this problem: 


Lemma 22.5 (Existence of a Common Superfield). Suppose that F is a field 
and a1, ..., Q, are radical expressions over F. Then there is an extension 
field Q of F such that for each j € {1,...,k}, there is a tower of fields F = 
fo,j < Fy ij ae < F),j < Q. with aj E Fig) where Fi = Fi_-1,5[p:,9] and 
i,j ts a simple radical over Fj_1,;. 


J? 


Solvable Polynomials 265 


Q 
/ 
F, [8] 

; i 
“ Fla, 6] ! Fla, 6] 
Fy ye Fy vA 
H Fla] ut Fla] 
ie 7 
F F 


FIGURE 22.1: Field towers without (left) and with (right) a common 
superfield 


Proof. We induct on k. By definition of radical expression, for each 7 € 
{1,...,k} there is a tower of fields Ip; = F < Ii; < +--+ < Ly (j),; such 
that a; € Lp(j),j and Li; = Li-1,;[:,3], where p;,; is a simple radical over 
L,_1,;. Thus, the conclusion is immediate when k = 1. So suppose inductively 
that there is an extension VW which contains all the requisite field towers for 
Q1,.-., @p—1. Since each p;,, is algebraic over Lj-1,,, then Lx), is a finite 
extension of F’. So by Lemma 16.10, there is an extension field Q of UV which 
admits an embedding L,(x),,. Q over F. We take Fj, to be the image of 
L,,, under this embedding. 


Remark 22.6. A popular fix for the problem of the common superfield is to 
suppose in advance that all fields and elements in question lie within a single 
algebraically closed field. This is often convenient; the reader may wish to take 
this approach in what follows, and see how it simplifies some of the proofs. 
Instead of this approach, we shall make use of Lemma 16.10 when we need 
to, as in the proof of Lemma 22.5. But see Project 23.4 for a construction of 
an algebraically closed superfield in the case when all fields in question are 
finite, and Project 23.3 for a proof of the existence of an algebraically closed 
superfield in general, assuming the Axiom of Choice. 


22.2 Solvable Polynomials 


Next, we define what it means for a polynomial to be solvable by radicals: 
namely, that all its roots are radical expressions. 


266 Solvability of Polynomial Equations by Radicals 


Definition 22.7. Let F be a field, and let f € F[a] — F be a non-constant 
polynomial. We say that f is solvable by radicals (over F’) if there is an exten- 
sion field K of F such that f factors completely over K, and every root of f 
in K is a radical expression over F’. 


Example 22.8. Let F be a field with char(F’) 4 2, and let f = ax? + br+ceé 
F[z] be a quadratic polynomial, with a,b,c € F and a 4 0. Set D = b?—4ac € 
F, and let L be a splitting field for 2? — D over F. Let d be a root of x? — D in 
L. Note that 2a 4 0 in F, since char(F’) 4 2 and a 4 0. Set a = (—b+d) /(2a) 
and 6 = (—b — d)/(2a). Then we have a,8 € Fld], and (# — a)(x — 8) = 
x? —(a+6)r+a8 = x? +(b/a)x+ (b? —d?)/(4a?) = x? +(b/a)x+ (4ac)/ (4a?) 
= r? + (b/a)x + (c/a) = f/a. Therefore, f = a(x — a)(x — 8), so f factors 
completely over F'[d]. Further, F < Fd] is a tower of the type required by 
Definition 22.2 to show that a and 6 are radical expressions over F’. So every 
quadratic polynomial over a field of characteristic different from 2 is solvable 
by radicals. 


Our main question is, which polynomials are solvable by radicals? Histor- 
ically, people worked in characteristic 0, where (as a consequence of Exam- 
ple 22.8) all quadratic polynomials are solvable by radicals. But what about 
degree-3 (“cubic”) polynomials? Is there a “cubic formula”? And what of 
higher degrees? 

Since the definition of radical expression applies to every element in the 
top field F;. in Definition 22.2, it makes sense to give one more definition: 


Definition 22.9. Let K be an extension field of F'. We say that K is a 
solvable extension of F’ (or that K is solvable over F’) if every element of KC is 
a radical expression over F’. 


The next result says that a finite extension is solvable iff it is generated by 
radical expressions, iff there is a single tower of radical extensions containing 
it. 


Lemma 22.10. Let Kk be a finite extension field of F. Then the following are 
equivalent: 

(1) K is solvable over F; 

(2) We can write K = Flay,...,Q@m] where a1, ..., Am € K and each a; 
is a radical expression over F; 

(8) We can write F = Fy < Fi <--- < F, where Fj = Fj-1[p;], each p; 
is a simple radical over F;_1, and K < F,. 


Proof. (1) = (2): Suppose that K is solvable over F’. By Exercise 15.7, we can 
write K = Flay,...,@m] for some aj, ..., Am € K. Now each a; is a radical 
expression over F’, by Definition 22.9. 

(2) = (3): Suppose that such elements a; (1 < i < m) exist. Then 
by definition of radical expression, we can write a; € F[pij,.--, Pin] 
where p;,; is a simple radical over F[pi1,...,:,;-1]. By Lemma 22.5, we 
may suppose that all the p;,;’s lie in a common superfield 2 of F. Let 


Solvable Polynomials 267 


E = Fleas. ++) Ptyn(a)s+++s Pmyls+ ++) Pmn(m)|- By adjoining the p;,;’s to F 
in the order listed, we see that each p;,; is a simple radical over the previous 
field. Further, we have a; € L for each 1, so K < L, and we have established 
(3). 

(3) = (1): This follows immediately from Definitions 22.2 and 22.9. 


Corollary 22.11. Let F be a field, and let f € Fla] —F. Let K be a splitting 
field for f over F. Then f is solvable by radicals over F iff Kk is solvable over 
F. 


Proof. By Theorem 15.21, we have K = F[ay1,...,@n], where a; are the roots 
of f in kK. The result is now immediate from Lemma 22.10 and the fact that 
any two splitting fields for f over F' are isomorphic over F' (Exercise 16.9). 


In order to clarify condition (3) in Lemme 22.10, we make the following 
definition. 


Definition 22.12. Let K be a finite extension field of F’. We say that K is 
a pure radical extension of F if we can write F = Fo < F, <--- < F, where 
F; = F;~-1[p;], each p; is a simple radical over Fj_,, and K = F,. 


Thus, a finite extension is solvable iff it is a subfield of a pure radical 
extension. Next we ask, what do we get if we form a radical expression out of 
radical expressions? The answer is, another radical expression: 


Lemma 22.13 (Transitivity of Solvability). Let F, K, and L be fields with 
F<K <UL. Then L is solvable over F iff both L is solvable over K and kK 
is solvable over F. 


Proof. (=): Suppose that LD is solvable over F’. Let a € L. By definition of 
radical expression, there is a tower of fields F = Po < F, <--- < F, such that 
a € F,, and for alli € {1,2,...,r}, we have F; = F;_1[p;] for some p; which is 
a simple radical over F;_,. Since F;. is a finite extension of F’, then we also have 
[FE : F[a]] < oo. Using Lemma 16.10, we can find an embedding F,. — Q over 
Fa] for some extension field Q of L; so without loss of generality, we suppose 
that F, and L are both contained in a common superfield. Set Ko = K and 
K; = Ki-1 [pi] for 7 € {1,2,...,r}. Then inductively, we see that F; < K; for 
each 7. Now some power of p; is in F;, hence also in K;; so p; is a simple radical 
over K;. Since a € F, < K,, we have shown that a is a radical expression 
over Kk. Therefore, L is solvable over kK. To see that K is solvable over F, 
let 6 € K, and just observe that 6 € L, so @ is a radical expression over F’: 
Definition 22.2 does not actually use the field from which (@ was chosen. 
(<=): Suppose that L is solvable over K and K is solvable over F’. Let 
a € L. Then there is a tower of fields K = Ko < Ky <--- < K,. such that 
a € K, and for all i € {1,2,...,r}, we have K; = K;_1[p;] for some p; which 
is a simple radical over K;_1. So we have K; = K[p1,...,;]. We proceed by 
induction on i to prove that K; is solvable over F’', which is equivalent (via 
Lemma 22.10) to proving that 1, ...,; are radical expressions over Ff’. When 


268 Solvability of Polynomial Equations by Radicals 


i = 0, this statement is vacuously true. So inductively suppose that 0 <i<r 
and K; is solvable over F. Now a := p't,, € Kj; for some n € Z*, so ais a 
radical expression over F’ by inductive hypothesis. So there is a tower of fields 
F=fFo< Fi <.--- < F, such that a © F;, and for all 7 € {1,2,...,t}, we 
have F; = F;_1[r;] for some r; which is a simple radical over F;_. Since F; 
is a finite extension of F’, then we also have [F; : F'a]] < co; so by Lemma 
16.10, there is an extension field Q of AK; such that F, @ © over Fla]. We 
suppose without loss of generality that in fact Fy < Q. The tower of fields 
Po < Fi <---<Fi < Fi[pi41] shows that p;1 is a radical expression over F’. 
This completes the induction. Now since a € K, and K, is solvable over F, 
then a is a radical expression over F’. This completes the proof. 


22.3. Solvable Groups 


Let us start with the simplest type of (non-trivial) solvable extension and 
investigate its structure. So suppose that p is a simple radical over a field F; 
thus, p” =: a € F for some n € Zt. We see that p is a root of the polynomial 
f(x) := 2" —a € F{a]. But rather than looking merely at Fp], we strongly 
suspect that we will be much better off by considering a normal closure L of 
Fp] over F’. For then (at least, if f has no repeated roots), L is Galois over 
F’, and we may be able to use group theory to study our fields. 

What are the roots of f? Well, ¢ is one root. An examination of Example 
15.27 might lead us to suspect that every root of f has the form p multiplied 
by an n* root of 1. In fact, this is correct. To get our normal closure, our idea 
is to first adjoin all necessary roots of 1, and then adjoin our original radical. 


Proposition 22.14. Let F be a field, and let p be a simple radical over F, 
with p’ =aé Fandne Z*. Let f = a2" —a € Fla]. Let L be a normal 
closure of F |p| over F. Let R be the set of all n** roots of 1 in L, and set 
K = F(R). Then: 

(1) K = F¢] for some ¢ € R; 

(2) L = K(p\; 

(3) If char(F) does not divide n, then L is Galois over F’. 


Proof. The case p = 0 being easy, we suppose that p ~ 0. Note that R is a 
subgroup of L*. By the Root-Factor Theorem (Theorem 14.24), |R| < co, so 
by Exercise 21.18, R is cyclic. Let ¢ be a generator of R; then we have (1). 
Since K < L and p € L, we have K[p] < L. Let g = Irr(p, Fx). We may 
have g # f, but since f(p) = 0, certainly g divides f in F[a]. Let r be any 
root of g in L. Then f(r) = f(p) = 0, so r” = p” =a, and (r/p)” = 1. Thus 
r/p € R,sor/p€ K < K[p]. Since also p € K[p], we get r € K[p]. Thus K[p] 
contains all roots of g in LZ, and hence contains L (e.g. by Theorem 15.21 and 


Solvable Groups 269 


Proposition 16.25). This gives (2), and we note that L is a splitting field for 
g over F. 

For (3), suppose that char(F’) /n. Then n- 1p € F™; and since f’(x) = 
n- x"! =(n-1p)x"~! € F[a], the only possible root of f’ is 0. But 0 is not 
a root of f. So by Proposition 16.15, f has no repeated roots, and therefore 
neither does g. Now L is Galois over F' by Exercise 16.13. 


To understand the structure of the field extension L/F under the hypothe- 
ses of Proposition 22.14, Galois theory tells us to look at the structure of the 
corresponding group G := Gal(L/F), and vice versa. The intermediate field 
K makes a nice way station. 


Lemma 22.15. Let F be a field, and let f = a” —a € Fa] be a polynomial 
withn > 1 anda eé F. Let p be a root of f in some extension of F, and L be 
a normal closure of F|p| over F. Suppose that char(F){n. Then L is Galois 
over F’, and there is a subgroup H of G := Gal(L/F) such that H is abelian, 
HG, and G/H is abelian. 


Proof. By Proposition 22.14 we can say that L = Kp], where kK = F[¢] for 
some ¢ € L with ¢” = 1. Proposition 22.14 also gives that LD is Galois over 
F. Let G = Gal(L/F), and let H = Gal(L/K). Then H < G (by Lemma 
16.29). Now K is Galois over F’, and Gal(K/F)) is abelian, by Exercise 21.20; 
so H < Gand G/H is abelian, by the Fundamental Theorem of Galois Theory. 

Let o € H. As in the proof of Proposition 22.14, all roots of f in EL are 
of the form ¢*p, where k € N. Since f € F[z] < K[z], and o € Gal(L/K), 
then o must send a root of f in L to another root of f (e.g. by Exercise 16.4). 
So a(p) = ¢*p for some k € N. Since ¢ € K, we have o(¢) = ¢. Now let 
t € H; then similarly, we have t(p) = ¢'p for some @ € N, and 7(¢) = ¢. 
So o(p) = o(¢*p) = o()o(p) = Cfo(p) = Cf¢Ro = CTR p = CPt = CRC ho 
= ¢'r(p) = 7(C*)r(p) = 7(C*p) = To(p). Since L = K[p], every element of 
HT is determined by its action on p. So or = Ta. But o and 7 were arbitrary 
elements of H, so H is abelian. 


One observation we can now make about a normal closure L of F'[p] over 
F (where p is a simple radical over F’) is that L itself is a solvable extension of 
F. This is because L is composed of two steps, where in each step we adjoin 
a simple radical: first adjoining a root of 1, and then adjoining p itself. In 
particular, a normal closure of the simple radical extension F'[p] is a pure 
radical extension of F’. Can we generalize this result? Is a normal closure of 
any pure radical extension also pure radical? Let’s look at a small example. 
What are the conjugates of 8B := VW3+ V2 over Q? If our hope is realizable, 
then the conjugates should also be radical expressions over Q. Since ( is a root 
of f := 2° — (3+ V2), we may be tempted to conjecture that the conjugates 
of B are the roots of 2° — (3+ G4 V/2) for j € {0,1,...,6}: that is, we just take 
roots of the polynomials formed by conjugating the coefficients of f. In fact, 
this argument works. 


270 Solvability of Polynomial Equations by Radicals 


Lemma 22.16. Let K be a pure radical extension of F,, and let L be a normal 
closure of K over F. Then L is also a pure radical extension of F. 


Proof. We can write F = Fo < Fy <--- < F, where F; = Fj_1[pi], p; is a 
simple radical over F;_,, and K = F,.. We induct on r. When r = 0, then 
L = K = F, which is certainly a pure radical extension of F’. Inductively, 
we have that the normal closure L’ of F._; over F inside L is a pure radical 
extension of F’. Since p, is a simple radical over F,_1, we have a := p?’ € F,_y 
for some n € Zt. Now F, is algebraic over F (since F,. is a finite extension of 
F), so we can set g := Irr(p,, F, x). Let R be the set of all roots of g in L. By 
Exercise 22.3, we have that L’[R] is normal over F’. Certainly L’[R] < L. On 
the other hand, L is the smallest extension of K inside L which is normal over 
F’, by definition of normal closure; but K = F, = F,_1[p,] < L’[p,| < L’[R). 
Therefore L = L'[R]. Let 8 € R. [Show that 6 is a root of «” — & for some 
conjugate @ of a.| Set h(x) = Irr(a, F,x) and & = 6”. [Idea: (x — a) divides 
h, ie. (x — pt’) divides h; so p, will be a root of 2” — a. Replace x by x” in 
h.] Set w(x) = h(a”) € Fla]. Then w(p-) = h(p?) = h(a) = 0. It follows that 
g divides w in F[a]. Since g(3) = 0, we must have w() = 0 as well; therefore, 
h(6") = 0, so @ is a root of h. But a € F,_, < L’, and L’ is normal over 
F, so @ is also in L’. Since @ is a root of x” — a, then £ is a simple radical 
over L’. So every element of R is a simple radical over L’. Since L = L’[RI, 
and L’ is a pure radical extension of F’, then we can adjoin the finitely many 
elements of R one at a time in any order to L’ to see that L is also a pure 
radical extension of F’. 


To understand the group theory corresponding to solvable extension fields, 
we will start with a solvable extension which is also Galois. We embed this 
extension in a pure radical extension field realized as a tower of simple radical 
extensions, and replace each step in this tower with its normal closure. Our 
original solvable extension will correspond via Galois Theory to a quotient 
of the overall Galois group. The catch is that the characteristic of our fields 
cannot divide any of the degrees of the radicals in order for our machinery 
to work properly. So we will make the simplifying assumption that we are in 
characteristic 0. 


Proposition 22.17. Let F be a field of characteristic 0, and let Kk be a finite 
solvable Galois extension of F. Set G = Gal(K/F). Then there is a tower of 
groups {eg} = Go < Gi < +--+ < Gy such that G is isomorphic to a quotient 
of G, and for alli € {1,...,n}, we have Gi_1 <1 G; and G;/G;_1 is abelian. 


Proof. By Lemma 22.10, we can realize K as a subfield of a pure radical 
extension L of F; and by Lemma 22.16, without loss of generality we suppose 
that DZ is normal over F’. We can write F = Fo < F, <--- < F, = L where 
F; = Fj~1[p;], each p; is a simple radical over Fj_1, and K < L. 

Construct a new tower of fields Lp < Lg < --: < Lo, < L as follows. 
Set Lo = F, and for j > 1 let Lo; be the normal closure of L2(;~1)[p;] over 
Loj—-1) in L. We see inductively that Fy < Le; for each 7. Now Lo, < L, but 


Solvable Groups 271 


also L = F,. < Lo,, so Lo, = L. Because char(F’) = 0 and LF is normal over 
F, then L is Galois over F’. By Lemma 22.15, we can say that L;/L2(;-1) is 
Galois and P'2; := Gal(L2;/L2(;-1)) decomposes as {e} < T2;-1 < T'2;, where 
T95-1 is abelian, Toj-1 J Tj, and 19; /T 25-1 is abelian. For j E {1,. oe TH, 
set Loj-1 = L3?!-*, the fixed field of P'2;-1, and set Gj = Gal(L/L2,—j) for 
j € {0,1,...,2r}. Then we have {e} = Go < Gi <--- < Go,. Basic Galois 
theory (see Exercise 22.8) shows that G;_1 dG; and G;/G,_1 =Tj/T;-1 (if 
j is even) or T’, (if j is odd). Set H = Gal(L/K). By the Fundamental Theorem 
of Galois Theory, we have G = Gal(L/F)/H = G2,/H, which completes the 
proof. 


Proposition 22.17 gives us a framework for translating the notions of a pure 
radical extension and a solvable extension from field theory into group theory. 
Namely, we could define a finite group G to be “pure radical” if there is a 
tower of subgroups as in that proposition with G at the top, and a “solvable” 
group to be a quotient of a pure radical group. But it turns out that there is no 
need for two separate definitions: with groups, the two notions are equivalent. 
Thus, we make the following definition. 


Definition 22.18. Let G be a finite group. We say that G is solvable if there 
exists a tower of subgroups {eg} = Go < Gi <--- < G, = G such that for 
alli € {1,...,n}, we have G;_1 < G; and G;/G;_1 is abelian. 


Remark 22.19. Although the property of being a subgroup is transitive—that 
is, if A < Band B < C, then A < C—the property of being a normal 
subgroup is not transitive: see Exercise 16.25. 


Next we demonstrate the equivalence referred to above: we don’t get any 
new groups by taking quotients of solvable groups. 


Lemma 22.20. A quotient of a solvable group is solvable. That is, if G is a 
solvable group and Q is isomorphic to a quotient of G, then Q is also solvable. 


Proof. Let G be a solvable group, and write {eg} = Gp < Gp < ---<G, =G 
where for all 7 € {1,...,n}, we have Gj;_1 < G; and G,;/G;_1 is abelian. 
Suppose that there is a surjective group homomorphism v : G > Q, which 
by the Fundamental Theorem of Group Homomorphisms is the same thing as 
saying that Q is isomorphic to a quotient of G. Set Q; = v(G;). Then we have 
{eg} = Qo < Q1 < --- < Qn = Q (using Theorem 7.13 (i)). Furthermore, we 
have Q; < Qj41 by Exercise 7.8, and Q;+1/Q, is isomorphic to a quotient 
of Gj41/G,; by Exercise 22.10. A quotient of an abelian group is abelian (by 
Exercise 11.5), so we are done. 


Corollary 22.21. Let F be a field of characteristic 0, and let K be a finite 
solvable Galois extension of F. Then Gal(K/F') is a solvable group. 


Proof. Combine Proposition 22.17 with Lemma 22.20. 


272 Solvability of Polynomial Equations by Radicals 


Corollary 22.21 implies that if a polynomial in characteristic 0 is solvable 
by radicals, then the Galois group of its splitting field must be solvable. The 
natural question now is whether the converse holds. Again it makes sense to 
start with the simplest possible non-trivial field extension K’/F in terms of the 
Galois group G—namely, when G is cyclic. Even in this restricted situation, 
we can make the situation still simpler by requiring G to have prime order: for 
then, G has no intermediate subgroups, and K/F' has no intermediate fields. 

So suppose that we have K = Fla], where a is a simple radical over F, 
the extension K/F is Galois, and G = Gal(K/F'’) has prime order p. How can 
we recover a from Kk and F? We know that the conjugates of a over F' are of 
the form ¢/a where ¢ is a root of 1 in K. We also know that G is cyclic, say 
G = (c). Thus, we could look for an element a € K such that o(a) = (a for 
some j € Zt. 

Because G has order p, we know that o? = eg, the identity function on 
kK. Let us now indulge in a wild flight of fancy for the rest of the paragraph. 
You might agree that the identity function behaves as a sort of 1. With our 
knowledge of cyclotomic polynomials, it is tempting to rewrite the preceding 
equation as o? — 1 = 0; then factoring this equation via Corollary 21.15, we 
would find Tho (a — ¢/) =0, where ¢? = 1. If we were in a domain, we could 
conclude that o — ¢/ = 0 for some j, and thus that for any a € K, we have 
o(a) = Ca, as desired. But this argument cannot possibly work as stated, 
since it seems to conclude that all elements of K are simple radicals over F’. 

Amazingly, we shall be able to take the ideas from the preceding paragraph 
and make them work sufficiently well to establish the converse we seek. The 
key is to find a suitable ring which contains both o and things that behave 
like multiplication by the field elements 1, ¢, ..., C?~'. By viewing o in its 
role as a linear transformation, all this is possible. The reader is encouraged to 
review Exercises 13.15 through 13.18 at this point. For these exercises tell us 
that the set of all linear transformations from a vector space to itself naturally 
forms a ring, and the center of this ring is the set of all transformations which 
come from scalar multiplication by elements of the base field. 


Proposition 22.22. Let K/F be a finite Galois extension of prime degree 
p=([K: F]. Suppose that char(F’) 4 p. Then K < F{¢,a] where ¢ is a root 
of x? —1 and a? € F[¢]. In particular, K is a solvable extension of F. 


Proof. {We want to factor 2? — 1, so we must introduce p‘® roots of 1, which 
may not be in F or even in K.] Let L be a splitting field for x? — 1 over K. 
By Exercise 21.20, we have L = K[¢] for some ¢ € L such that all the roots 


Solvable Groups 273 


of x? —1 in L are powers of ¢. Let F = F{¢]. 


L=K{( 
er 
Vg 
Since x? —1 = (x — 1)(x?~-! + aP-2 +... + x41), we see that either ¢ = 1 


or ¢ is a root of a?! +4 a?-2 +...+4+a+1. In either case, we have [F : F] = 
deg(Irr(¢, F,x)) < p—1. Since [K : F] = p by hypothesis, then p divides 
[L: F], and since p / [F : F], we must have p | [L : F]. By Proposition 19.13, 
we know that L is Galois over F and G := Gal(L/F) © Gal(K/F). It follows 
that [L: F]} =p= |G. 

Let R be the ring of all linear transformations from L to L as an F-vector 
space. Then we have G C R (e.g. by Exercise 16.22). Let C = C(R) be the 
center of R; then we have C = {tg : a€ F }, where lg is the multiplication- 
by-a map from L to L; that is, ua(b) = ab for b € L. Since |G = p, which is 
prime, then G is cyclic by Theorem 8.20. Write G = (c). By Exercise 14.12, the 
ring S$ := {Yy=0 cjo) : c; €C,n € Z*} isa commutative subring of R. Thus, 
we can consider the evaluation-at-o map €, : C[a] > S, and we have S = C[o]. 
Now C & F (as rings) via the map [lq +> a, by Exercise 13.18. This map 
extends to an isomorphism Cz] = F'[z] (by Lemma 16.1), so we can CRROSe 
with €, to get a ring homomorphism 7 : F(a] > S, we 7 Ge = Se ote, 2? 
Writing 1 for 1p, we have yz, = idy = eg, so T(z? —1) = o? — jy = Og 
(by Corollary 8.25). Since char(F’) = char(F') 4 p, the polynomial x? — 1 
has no repeated roots in F’ (by Proposition 16.15), so the order of ¢ in F* 
is p and not 1. Thus 2? — 1 factors in F[a] as a? —1 = i 5x — (). So 
we have r(x? — 1) = []?=5(o — Hg) = 0. Set wy, = TIjo(o — bes) € 8 for 
ke {0, 1,...,p—1}. Let 6 € L — F; we know such an element exists because 
[L:F)=p>1 (using Exercise 15. 5). If o(8) = B, then B € L{*} (see Exercise 
16.23) = L{°) = = L& = F, which is false. Therefore, o(B) # B, and wo(8) £0. 
Set n= max{k EN : we(B) # O}; we have n < p—1, since wp_1 = 0. Set 
a = w,(8). Now by construction we have 0 = wr41(8) = (o — pent) (Wn(B)) 
=(o- Bese =oa(a) —¢"*1a. So a(a) = ¢"*1a. It follows that o(a?) = 
(o(a))? = CP("+DaP = a? (using that ¢? = 1). Since a? is fixed by a, we see 
as before that a? € F. Now since 1 < n+1< p—1, we have carr 4 1, and 
thus o(a) 4 a. It follows by Galois Theory that a ¢ P. Since [L : F] is prime, 
and F < F[a] < L, we must have Fla] = L. This completes the proof. 


When a field extension is Galois of prime degree, at least in characteristic 0, 
then just knowing the Galois group structure is enough to guarantee that the 
field extension is solvable. How does this help with a general solvable Galois 


274 Solvability of Polynomial Equations by Radicals 


group? We show next how any solvable group can be relentlessly broken down 
into extensions of prime order. 


Proposition 22.23. Let G be a non-trivial finite group. Then G is solvable 
iff there exists a tower of subgroups {eg} = Go < Gi < +--+ < Gn = G such 
that for alli € {1,...,n}, we have Gj_1 < G; and G;/Gj_1 is cyclic of prime 
order. 


Proof. (=): From the definition of solvable group, there is a tower {eg} = 
Go < Gi <---< G, = G such that for all 7 € {1,...,n}, we have Gj_1 <I G; 
and G;/Gj_1 is abelian. Since G is non-trivial, we may assume that all the 
inclusions here are strict. If the result were true for all non-trivial finite abelian 
groups, then we could lift each tower for G;/G;_1 up to G, and they would fit 
together to form a tower with the desired properties (see Exercises 7.17 and 
19.3). So we only need to consider the case when G is abelian. We proceed 
by induction on |G]. In case |G| = 2, then G is already cyclic of prime order, 
so we take n = 1 and G, = G. Inductively, suppose that the result holds for 
abelian groups of order less than |G|. Let p be a prime which divides |G]. By 
Cauchy’s Theorem, there is an element g € G of order p. Let H = (g). Then 
we have H <i G (since G is abelian), and G/H is also abelian, of order less 
than |G|. If G/H is trivial, then G is already cyclic of prime order, so we 
are done. Otherwise, by inductive hypothesis, G/H has a tower of the desired 
type, which we lift to G and insert after the tower {eg} < H to finish the 
proof. 

(<): Since every cyclic group is abelian, this direction is immediate from 
the definition of solvable group. 


Now we are ready to characterize those polynomials over a field of charac- 
teristic 0 which are solvable by radicals, solely in terms of the group-theoretic 
properties of their Galois groups. 


Theorem 22.24. Let F be a field of characteristic 0, and let f € Fla] — F. 
Let K be a splitting field for f over F, and set G = Gal(K/F). Then f is 
solvable by radicals iff G is a solvable group. 


Proof. (=): Suppose that f is solvable by radicals. Then by Corollary 22.11, 
K is a solvable extension of F’. So by Corollary 22.21, G is solvable. 

(<=): Suppose that G is a solvable group. By Proposition 22.23, there is 
a tower of groups {eg} = Go < Gi < --- < G, = G such that for all 
gj € {1,...,n}, we have G;_1 < G; and G;/G;-1 is cyclic of prime order. 
Set Fy = K-35, the fixed field of G,-;. Then we have F = fp < Fy < 
--. < F, = K, and (by Exercise 22.8) each F; is Galois over Fj_, with 
Gal(Fj/Fj-1) = Gn—j4i/Gn—;. By Proposition 22.22, we know that each F; 
is a solvable extension of Fj;_,. Now applying Lemma 22.13 repeatedly, we see 
that K is a solvable extension of F’. So by Corollary 22.11, f is solvable by 
radicals. 


Galois Groups in the Generic Case 275 


22.4 Galois Groups in the Generic Case 


In light of Theorem 22.24, the question of which polynomials are solvable 
by radicals (in characteristic 0) is intimately related to the question of the 
solvability of Galois groups. If we want to go beyond the quadratic formula 
to a cubic formula which gives the general solution to a cubic polynomial, for 
instance, then we need to understand which groups occur as Galois groups of 
the splitting field of a cubic polynomial. 

So let us summarize some relevant results which we have already seen in 
previous chapters. Let F' be a field of characteristic 0, let f € F [a] — F, and 
let K be a splitting field for f over F'. Let R be the set of all roots of f in 
K, and set G = Gal(K/F). We know from Exercise 19.9 that G — Sym(R) 
via the action of G on the roots of f; an element of G is determined by 
what it does to these roots. We also know that |R| < deg(f) =: n, so that 
Sym(R) = Siz; > S, (by Exercises 9.5 and 9.6). Thus we have G > Sp. It is 
natural to ask whether this embedding can ever be an isomorphism: that is, 
whether every possible permutation of the roots can yield an automorphism 
of K over F. 

In order for this to happen, we first must have |R| = n. Write R = 
{ri,..., Tn}. Now if ri, ..., rn were independent variables in a polynomial 
ring, then we would be able to permute the r;’s at will—all the r;’s would 
look alike in a very strong sense—and we could realize every element of S;, as 
an automorphism. Let us try this approach. 


Proposition 22.25. Let C be a field and let n be a positive integer. Let 
S = Clri,...,Tn] be the polynomial ring in the n independent variables ry, 

.., Tn with coefficients in C. Set K = k(S) = C(ri,...,7n), the field of 
fractions of S. Let f =|[i_,(x@—1;) € K[z]. Write f = 2” + 59 a;x) with 
a; € S, and set F = C(ao,...,an-1), 80 that f € Fla]. Then K is a splitting 
field for f over F, and we have Gal(K/F) = Sj. 


Proof. By inspection, f factors completely over K. Suppose that L is a field 
such that f factors completely over D and F < L < K. Then C < L and 
L must contain each r;, so we have S = C[ri,...,1n] < L. Therefore, K := 
k(S) < k(L) = L. So K is a splitting field for f over F’. Since f has no repeated 
roots (we assumed that r, through r, were algebraically independent over 
C), then K/F is Galois, by Exercise 16.13. Let R = {ri,...,7n}, and let o € 
Sym(R). By Exercise 14.11, there is a unique ring homomorphism ¢ : S > S$ 
over C such that o(r;) = o(r;) for all 7. By applying the Universal Property of 


Localization (Lemma 20.9) to the composite map S 7+ 9 <4 k(S) = K, we see 
that @ extends to a ring homomorphism 7: K — K. It is easy to see that @ is 
bijective (its inverse is induced by o~!), so a € Aut(K). Let s: K[z] > K[a] 
be the extension of to K [a] of Lemma 16.1. Since s fixes x and permutes the 
r;’s, we have s(f) = f. From this it follows that s fixes each a;, and thus s 


276 Solvability of Polynomial Equations by Radicals 


fixes every element of F'; hence so does o. Thus, 7 € Gal(K/F’) =: G. Also, we 
have |p = 0. The map G > Sym(R), 7 + TR of Exercise 19.9 is therefore 
surjective. Hence G ~ Sym(R) = Sy. 


In a sense, the polynomial f of Proposition 22.25 is the most “generic” 
monic polynomial of degree n; if f is solvable by radicals, then we may suspect 
that any degree-n polynomial is too. This hunch is borne out in the following 
results. 


Lemma 22.26. Let G be a finite group. Then: 
(1) If H <G and G is solvable, then H is also solvable. 
(2) Suppose NG. Then G is solvable iff both N and G/N are solvable. 


Proof. (1): By Cayley’s Theorem together with Exercise 9.6, we have G 
S, for some n. By Proposition 22.25 with C = Q, there is a finite Galois 
extension K/F’, with char(F’) = 0, such that S := Gal(K/F) = S,,. Without 
loss of generality, we suppose H < G < S. Set A = K© and B = K*". 
Then F < A< B< K. Now K is Galois over A, with Gal(K/A) = G, and 
similarly, Gal(K/B) = H. By Exercise 16.13, K is a splitting field over A for 
some polynomial f € A[z]. Since G is solvable, then by Theorem 22.24, f is 
solvable by radicals over A; so by Corollary 22.11, K is a solvable extension 
of A. By Lemma 22.13, K is also a solvable extension of B. By Exercise 16.13 
again, we can find a polynomial g € B[a] such that K is a splitting field for g 
over B, and then by Corollary 22.11 g is solvable by radicals over B. Therefore 
H is a solvable group by Theorem 22.24. 

The proof of (2) is left to the reader as Exercise 22.4. 


Remark 22.27. There is a more direct, purely group-theoretic proof of Lemma 
22.26 (see Exercise 22.5). But the proof given above shows how a universal 
family of groups (the symmetric groups, in this case) can translate “obvious” 
facts about radicals to not-quite-so-obvious facts about groups, via Galois 
theory. 


Corollary 22.28. Let n be a positive integer. The following are equivalent: 
(1) Every polynomial of degree n over a field of characteristic 0 is solvable 
by radicals; 
(2) S, is a solvable group. 


Proof. (=): Suppose (1) holds. By Proposition 22.25, there is a field F’ of 
characteristic 0 and a polynomial f € Fa] — F of degree n such that G := 
Gal(K/F) = S,, where K is a splitting field for f over F'. Now f is solvable 
by radicals (by (1)), so G is a solvable group by Theorem 22.24. 

(<): Suppose that S,, is solvable. Let F be a field of characteristic 0, and 
let f € Fz] be a polynomial of degree n. Let K be a splitting field for f over 
F, and let R be the set of all roots of f in K. Then K is Galois over F’, and 
we have G := Gal(K/F’) — Sym(R) © S),. So G is isomorphic to a subgroup 
of S,. Since S;, is solvable by hypothesis, then G is also solvable, by Lemma 
22.26. So f is solvable by radicals, by Theorem 22.24. 


Which Groups Are Solvable? 277 


Remark 22.29. For a counterexample to the biconditional of Corollary 22.28 
in a field of positive characteristic, see Exercise 22.18. 


22.5 Which Groups Are Solvable? 


It is high time that we look into methods for determining whether a given 
group is solvable, and in particular which of the symmetric groups S, are 
solvable. In the definition of solvable group, we need a normal subgroup whose 
corresponding quotient group is abelian. Let G be a finite group. Recall from 
Example 10.7 and Definition 10.8 that the commutator subgroup of G has 
exactly these properties, and moreover is in a sense optimal in this regard. 
By forming commutator subgroups repeatedly, we can decide whether G' is 
solvable; that is the content of the following result. 


Lemma 22.30. Let G be a finite group. Form a sequence Go, Gi, Go, ... by 
setting Go = G and taking G; to be the commutator subgroup of Gj—1 for each 
j > 1. This sequence must stabilize, and G is solvable iff we have Gy, = {ec} 
for all sufficiently large n. 


Proof. From the definition of the commutator subgroup, for each 7 > 1 we 
have Gj+41 IG; and G;/Gj+1 is abelian. So Gp D G) D ---. Since G is finite, 
the sequence must stabilize at some G,,: that is, we must have G,, = Gy, for 
all n > m. We proceed to prove the second statement of the result. 

(=): We prove the contrapositive. So suppose that Gm #4 {ec}. Then 
Gm+1 = Gm by construction. Assume for a contradiction that G is solvable. 
Since Gy», < G, then G,, is also solvable, by Lemma 22.26. So there is a 
sequence of subgroups of Gm, 


{ecg} =H) <M < Ap <---< HH, =Gn, 


such that H; < Hj+41 and H;41/H; is abelian for each j € {0,...,r—1}. We 
may remove any instances where H; = H;,, and suppose that H; < H+, for 
each j (this uses the fact that G,, is non-trivial). Thus, in particular, we have 
H,-1<G,, and G,,/H,—1 is abelian. Now since Gn41 = Gm, the commutator 
subgroup of G,, is itself, and the largest abelian quotient of G,, is the trivial 
group; but G,,/H,—1 is a non-trivial abelian quotient of G,,, a contradiction. 
This shows that G is not solvable. 

(<): Suppose that Gi, = {eg} for some m. Then we have {eg} = Gm < 
Gm-1 < ++: < Go = G, with G; < Gj_1 for each j € {1,...,m}, and each 
G;-1/G, is abelian. This is what it means to say that G is solvable. 


Example 22.31. S; is a trivial group, so the sequence S$; < $j satisfies the 
conditions of Definition 22.18, and S$ is solvable. This corresponds to the fact 
that a linear equation is solvable over a field. 


278 Solvability of Polynomial Equations by Radicals 


Example 22.32. S2 has order 2, which is prime, so S2 is abelian. Hence the 
sequence {e} < Sy shows that S2 is solvable. Similarly, we see that every 
abelian group is solvable. We can see this directly using Definition 22.18, or 
from Lemma 22.30, since the commutator subgroup of an abelian group is 
trivial. 

Let’s go farther: Proposition 22.22 is a constructive result, which gives a 
formula for finding a generator of a solvable field extension starting with any 
element from outside of the ground field. If we apply this result to a root 
of an irreducible quadratic polynomial f = ax? + br +c € F [az], where F 
is a field with char(F’) # 2, then the quadratic formula should materialize 
out of the mist. So let K be a splitting field for f over F’, and let p © K 
be a root of f. Then G := Gal(K/F) has order 2, and is generated by an 
element o. We have f = a(x — p)(x — o(p)). Therefore, p + o(p) = —b/a 
and po(p) = c/a. Let ¢ be a primitive root of x? — 1 in F, ie, ¢ = —1, 
and set a = wo(p) = (0 — pWe0)(p) = o(p) — p. We should have a” € F; so 
we compute a? = (a(p) — p)? = a(p*) — 2po(p) + p? = (—b/a)? — 4pa(p) 
= (b/a)? — 4c/a = (b? — 4ac)/a?. Indeed, a? € F. Since p+ o(p) = —b/a, we 
have p = 3((p-+0(p))-+(p—a(p))) = #(—b/a—a) = 4(—b/at/(P — 4ac) /a?) 

—b+ Vv b? — 4dac 

7 2a , 
Example 22.33. We have |.S3| = 6. By Sylow’s Theorem, $3 has a subgroup 
AT of order 3. Now H must be abelian since 3 is prime; and also we must have 
H <1 S3 since [S3 : H] = 2 (using Exercise 8.7). Furthermore, S3/H has order 
2, so must also be abelian. Thus the sequence {e} < H < $3 shows that S3 is 
solvable according to Definition 22.18. So a “cubic formula” should exist. 


Example 22.34. Let G = S4. Then |G| = 24, so G has at least one subgroup 
of order 8, by Sylow’s Theorem. 

First suppose that there is a unique Sylow 2-subgroup H of G. In this case, 
since any conjugate of H is also a subgroup of G of order 8, we conclude that 
Hf <G. Now H is a 2-group, so H is solvable by Exercise 22.6. The quotient 
G/H has order 3, so must be cyclic, and therefore also solvable. By Lemma 
22.26, we can say that G itself is solvable. 

On the other hand, suppose that G has at least two distinct subgroups H 
and K of order 8. Set L = HONK. Then 24 = |G| > |HK| =|A|-|K|/|Hn K| 
(by Lemma 18.9) = 64/|L], so |L| > 64/24 = 8/3. Also, we have |L| divides 
8 by Lagrange’s Theorem, and |L| < 8 since H # K. Therefore, |L| = 4. Now 
L is normal in both H and K by Exercise 8.7. Thus H, K < Ng(L) < G. By 
Lagrange’s Theorem, |Ng(L)| is a multiple of 8 and a factor of 24, but not 
equal to 8, since H and K are distinct subsets of Ng(L) of order 8. Therefore, 
|Ne(L)| = 24, so Ne(L) = G and L < G. Now L is a 2-group, hence solvable; 
and G/L has order 6, so is solvable by Exercise 22.7. So by Lemma 22.26, G 
is solvable. 

We conclude that S4 is solvable. 


Which Groups Are Solvable? 279 


Remark 22.35. In fact, S4 has 3 distinct subgroups of order 8. But notice how 
we did not need to perform any calculations in Sy in Example 22.34. Our 
argument shows that every group of order 24 is solvable! 


Remark 22.36. We could have omitted Examples 22.31 through 22.33, using 
the knowledge that S; @ Sz — S3 © $4; since Sy is solvable, then Lemma 
22.26 assures us that the smaller S,’s are solvable as well. But it is perhaps 
more instructive to start small and build up to $4, as we chose to do. 


Example 22.37. Every time we prove another S,, to be solvable, we are also 
proving (via Corollary 22.28) that every polynomial of degree n is solvable by 
radicals (in characteristic 0, at least). Historically, the quadratic formula was 
known since ancient times; the cubic formula was discovered around the start 
of the Italian Renaissance; and the general quartic (fourth-degree) polynomial 
was solved by radicals within another half-century after that. But the general 
quintic, i.e. fifth-degree, polynomial remained unsolved through the time of 
Abel and Galois, more than 250 years later. Knowing what we know, we shall 
take a different tack with S; than with the previous symmetric groups. 

What’s new in S;? One new feature is that S; has 5-cycles. Let H be the 
subgroup of Ss generated by 5-cycles: 


HT := ({(a1, @2, 43, 4,5) : 1 < a; < 5,a;’s are distinct}). (22.6) 


Suppose that o = (a1,...,@5) and 7 = (b),...,b5) are 5-cycles. Our strategy 
is to find o and 7 such that the commutator ora~!77! is yet another 5-cycle; 
if we can produce an arbitrary 5-cycle in this fashion, then the commutator 
subgroup of H will have to be H itself, which will stop the descending series 
of Lemma 22.30 in its tracks and force Ss to be unsolvable. So we wish to 
solve ota 't~! = w, where w is a 5-cycle. Somewhat arbitrarily, let us try 
w = (01, ba, bo, bs, bs). Then we want 


ota | = wrt = (by, bs, be, bs, ba). (22.7) 


Since w7 is another 5-cycle, and since any two 5-cycles are conjugate in 
Ss (by Exercise 19.7(b)), it must be possible to solve this equation for 0; but 
the question is whether o can be chosen to be a 5-cycle too. We recall from 
Exercise 19.7(a) that the conjugate of o by 7 is ora~! = (o(bi),...,o(bs)). 
The reader is invited at this point to find a choice for 0 which works. Because 
of the importance of the result, we shall state and prove it formally: 


Proposition 22.38. Ss is not solvable. 
Proof. Let H be the subgroup of $5 generated by 5-cycles: 
HAT := ({(a1, dg, a3, G4, a5): 1 < a; < 5, a;’s are distinct}). (22.8) 


Let C be the commutator subgroup of H. Let w be an arbitrary 5-cycle 
in Ss, and write w = (c1,C2,C3,C4,C5). Set o = (c1,¢3,¢5,C4,C2) and T = 


280 Solvability of Polynomial Equations by Radicals 
(C1, C3, C4, C2, C5). We compute 


a(c1), 0(c€3), 0(c4), (C2), 0(e5))7 4 (22.9 
€3,€5,€2,€1,C4)T (22.10 
= (C1, €2, €3, Ca, C5) (22.11 
=u. (22.12 


one = ( ) 
= ( ) 
) 
) 


Thus, w € C < S5. So C contains all 5-cycles in $5, hence H < C, by 
construction of H. Now we also have C < H (in fact, C < H). Therefore, 
C' = H. Thus, the series of commutator subgroups of H stabilizes at H. Since 
HT # {e}, then H is not solvable, by Lemma 22.30. Since H < Ss, now Lemma 
22.26 tells us that Ss is not solvable either. 


22.6 The Grand Finale 


At last we can completely classify the degrees n for which every polynomial 
equation of degree n over a field of characteristic zero in a single variable is 
solvable by radicals. 


Theorem 22.39. Let n be a positive integer. Ifn <4, then every polynomial 
of degree n over a field of characteristic 0 is solvable by radicals. But ifn > 5, 
then there exists a field F of characteristic 0 and a polynomial f over F 
of degree n such that f is not solvable by radicals over F. In particular, the 
general fifth-degree polynomial ts not solvable by radicals, even in characteristic 


0. 


Proof. Corollary 22.28 together with Examples 22.31 through 22.34 show that 
polynomials of degree at most 4 are solvable by radicals over a field of char- 
acteristic 0. But if n > 5, then we have S; © S;,, so Proposition 22.38 and 
Lemma 22.26 complete the argument. 


22.7 Exercises 


Exercise 22.1. Verify that the real number @ in Example 22.3 is a radical 
expression over Q by finding an appropriate tower of field extensions starting 
with Q which satisfies the conditions of Definition 22.2. 

Exercise 22.2. Suppose that F is a field, a is a radical expression over F’, and 
@ is a simple radical over Fa]. Prove that 8 is a radical expression over F’. 


Exercise 22.3. Let F be a field, let K be a finite normal extension field of 


Exercises 281 


F, and let f € F[z] — (0) be a non-zero polynomial. Suppose that LD is an 
extension field of K such that f factors completely in L, and let R be the set 
of all roots of f in L. Prove that A [R] is normal over F. 


Exercise 22.4. Prove part (2) of Lemma 22.26 using the Galois correspondence 
(in the same spirit as the proof of part (1) given in the text). 


Exercise 22.5. In this exercise, you will give an alternative proof of Lemma 
22.26, using “pure” group theory only. Suppose that G is a finite group. 

(a) Suppose that G is solvable, and let H < G. Let {ec} = Go < Gi < 
+++ <G, = G be a tower of subgroups of G such that G;_1 < G; and G;/Gj_1 
is abelian. Show that H; := HG; has the same properties, and thus that H 
is solvable. 

(b) Suppose that N <G and both N and G/N are solvable. Use the 
definition of a solvable group together with Exercises 7.17 and 19.3 to prove 
that G is solvable. 


Exercise 22.6. Prove that every p-group is solvable, where p is a prime number. 
Hint: Proposition 19.14 may be helpful. 


Exercise 22.7. Let p be a prime number. Prove that every group of order 2p 
is solvable. 


Exercise 22.8. Let K/F be a finite Galois extension. Set G = Gal(K/F). 
Suppose that F < F, < Fy < K. Let G; = Gal(K/F;) for 7 € {1,2}. Prove 
that F2 is Galois over F, iff Gp J G1, and in that case we have Gal(F2/F,) = 
G1 /G2. (You will only need basic Galois theory from Chapter 16.) 


Exercise 22.9. Let F be a field of characteristic 0, and let K be a finite normal 
extension field of F’. Prove or disprove that if K is solvable over F’, then K is 
a pure radical extension of F’. 


Exercise 22.10. Let 0: G + H bea surjective group homomorphism. Suppose 
that N dG, and set K = o(N). Note that K < H by Exercise 7.8. Let 
v: H + H/K be the natural map, and set w = vo. Prove that w induces a 
surjective group homomorphism from G/N to H/K. Hint: See Exercises 7.9 
and 9.15. 


Exercise 22.11. Compute the coefficients ao, ..., @n—1 of the polynomial f 
from Proposition 22.25 in terms of the variables rj, ..., rn, in the cases n = 
2 and n = 3. (These polynomials are known as the elementary symmetric 
functions in the r;’s.) 


Exercise 22.12 (S;, is generated by transpositions). Let n € Zt. A transposi- 

tion in S,, is defined to be any 2-cycle; that is, any element a € S, such that 

in cycle notation we have o = (a,b) for some a,b € {1,2,...,n} with aF b. 
(a) Prove that for any r-cycle (41, %2,...,@,) with r > 2, we have 


(x1, 22, a ae) = (Lp—1, Lr) (Lp_2, Lr) iad (Pi tele 


Thus, an r-cycle is a product of r — 1 transpositions. 
(b) Let T = {o € S, : o is a transposition}. Prove that (T) = S,. (See 
Exercise 19.6.) 


282 Solvability of Polynomial Equations by Radicals 


Exercise 22.13 (The Alternating Group A,). Let n € Zt, and let 7 = (a,b) 
be a transposition in S;, (see Exercise 22.12). 

(a) Suppose that a and 6 both belong to an r-cycle c, say c = (41,..., 2) 
with 7; =a, 7; =b, and 1<i<Jj <r. Prove that we have 


CT = (11, Xa, see Di, Uj11,Uj4+2,--- Ly) (Ui41, Vie, soe , 25), 


the product of two disjoint cycles of lengths r — (j — 7) and j — i. 

(b) Suppose that a and b belong to two disjoint cycles c and d, say c = 
(v1,...,%,) and d= (y1,...,ys) with x; = a and y; = b. Note that without 
loss of generality, we can suppose that 7 = s. Prove that we then have 


edt = (Pi, ee, oe Gi, Ys Y2,+ ++ 5 Ys, Vi41, Vit2,--- rea 


a single (r + s)-cycle. 

(c) Define a function f : S,, + Z by the formula f(o) = 04_,(¢; — 1), 
where the cycle structure of o is cs(a) = (01,...,4%) (see Exercise 19.7). 
Use the previous parts of this exercise to prove that for any o € S, and any 
transposition T € S,, we have f(or) = f(a) £1, and thus f(or) = f(a) +f (7) 
(mod 2). Use Exercise 22.12 to conclude that f induces a homomorphism 
1: Sy > (Z/2Z,+), mo) = flo) + 2Z. 

(d) Define A, := {0 € S, : o can be written as a product of an even 
number of transpositions}. Prove that A, = ker(7) and that if n > 2, we have 
[S, : An] = 2. Note that we have A, < S,,. We call A, the alternating group 
on the set {1,...,n}. 

(e) Suppose that r is an odd integer and 3 < r < n. Let H,. be the 
subgroup of S,, generated by all r-cycles. Prove that H; = Ay. Thus, the 
group H considered in the proof of Proposition 22.38 is none other than As, 
and we can conclude that As is not solvable. 


Exercise 22.14 (An inseparable extension). Let F = Z/pZ be the field of 
integers modulo p, where p is prime in Z. Let K = F(t) be the field of 
rational functions over F' in the variable t. Let f = #? —t € K[z], and let L 
be a splitting field for f over K. 

(a) Prove that f is irreducible in K |]. 

(b) Let @ be a root of f in L. Prove that f = (a — a)’. Conclude that 
f =Irr(a, K, x) and thus that L is not separable over K; we say that L is an 
inseparable extension of kK. Note that L is a solvable, normal extension of K, 
but L is not Galois over K. 


Exercise 22.15. Suppose that K/F is any finite extension of finite fields (that 
is, F < K are finite fields and [K : F] < ow). Prove that K is a solvable 
extension of F’. 


Exercise 22.16. Suppose that F is a field, and a is a radical over F; say a” € F 
with n € Z*. Also suppose that the polynomial x” —1 factors completely over 
F, 

(a) Let f = 2” — a” € F{a], and suppose that f = fi fo with f; € Fla]. 
Set d; = deg(f;). Prove that we have a% € F. 


Exercises 283 


(b) Let m = min{j € Z* : a! € F}. Prove that m divides n and that 
Irr(a, Fy) = 2 — a. 

(c) Suppose that m = ab with a,b € Zand 1 < a,b < m. Let 6B = a®. 
Prove that Irr(6, F,x) = x — 6° and Irr(a, F[6], x) = 27 — a". 

(d) Use the preceding parts of this exercise to prove that if m > 1 and 
m= TTj-1 p;, where each p; is prime, then we can write F = Fy < Fi <---< 
F, = Fla] with Fj) = Fj-1[a"], mj := m/(pipe--+p;), [Fj : Fj-1] = p;, and 
Irr(aj, Fy-1,@) = x3 — QP, 

Exercise 22.17. Suppose that K is a pure radical extension field of F’', with 
char(F’) = p > 0. Also suppose that for each positive integer d, the polynomial 
x? — 1 factors completely over F. 

(a) Prove that we can write F = Lp < Ly < --: < Ly = K, where 
L; = £;.4[o,) and lnr(o,;5.;-4; 2) = 2" — af for some elements a; € K and 
primes p;. (Use Exercise 22.16.) 

(b) Suppose that p; = p and pj41 =q # p. Set a= a; and 8 = aj;41, so 
that L; = Lj-1[a], Lj41 = L; [6], rr(a, Lj-1, 2) = x? —a?, and Irr(6, Lj, x) = 
x1 — 67, Note that we have 64 = f(a) for some polynomial f € L;_1[z]. Set 
b= f(a”). Prove that we have Irr(8?, L;_1,2) = x? — b, Irr(8, L;-1[8"], x) € 
{x?— 6, «—B}, and Irr(a, L;_1[6], x) € {z?—a”, x—a}. Use this to show that 
without loss of generality, we may suppose that all of the degree-p extensions 
in the tower of L;’s occur in a consecutive sequence at the top of the tower. 

(c) Use Lemma 15.25 and Exercise 14.11 to show that every element y € K 
can be written as y = h(a1,..., az) for a unique polynomial h € Fiay,..., v4] 
such that deg, (h) <p; for each j. 

(d) Let S = {jf : 1 < j < tandp,; F p}. Let y © K, and let S, = 
SU{1,2,...,t—r} for r € N. Prove by induction on r that we have 7?" € 
Fl{a; : j € S,}]. Hint: Use Corollary 21.21. 

(ec) Prove that for all y € K, the index [F[7"] : F] is not a multiple of p. 


Exercise 22.18. Let F be an algebraically closed field of characteristic 2. Let 
K = F(t) be the field of rational functions over F’ in the variable t, and let 
f=a?+a+te K{z]. Let L be a splitting field for f over K, and let a bea 
root of f in L. 

(a) Prove that L is a Galois extension of K with [L : K] = 2, so that 
Gal(L/K) = 5 is a solvable group. 

(b) Prove by induction on n that we have a2” = a+t+t?+t*+---+2" " 
for all n € N. Use this formula to show that K[a?"] = L for alln €N. 

(c) Prove that L is not a solvable extension of K. (Exercise 22.17 may be 
useful.) 


Exercise 22.19. If the roots of a monic polynomial are algebraically inde- 
pendent over a given field, then it makes sense that the coefficients of that 
polynomial should also be algebraically independent over that field. But 
prove this. Specifically, under the hypotheses of Proposition 22.25, prove 
that Clao,...,@n—1] is a polynomial ring in n variables over C; that is, 


284 Solvability of Polynomial Equations by Radicals 


{ao,..-,;@n—1} is algebraically independent over C’. Suggestion: Form a legiti- 
mate polynomial ring C[Ao,..., An—i] =: T, let h = a" + a Ajx" € T[c], 
let A be a splitting field for h over k(T), let Ri,...,R, be the roots of h in 
A, and consider the map sending r; to R; over C (see Exercise 14.11). 


Exercise 22.20. Under the hypotheses of Proposition 22.25, use the following 
steps to prove that Sp := SON F = Clao,...,@n_1] =: T. 

(a) Prove that TC Sp. 

(b) Prove that for each d € N, the sets 


Hg :={w € So : deg,,(t) = d for every term t of w} 


and 
Ha:={y eT : deg,o,(t) =d for every term t of y} 


are C-vector spaces, where deg,,, denotes the total degree, that is, the sum 
of the degrees in 11,...,7n. Note: a polynomial in which every term has the 
same total degree is called homogeneous. 

(c) Prove that the set {aG°---a,"q' : e; EN, o(n— j)e; = d} is a basis 
of Hg over C. Note: Exercise 22.19 may be useful here. 

(d) Prove that the set {W(rj'---rf") : e; € N,ey <--- < en, le; = 
d} spans Hg over C, where y : S —+T is given by the formula ~(w) = 
ceg 7(w) and G := Gal(K/F). 

(e) Show that the sets {(e1,...,€n) : e; € N,>o(n—jje; = d} and 
{(€1,---,€n) : e7 € N,e1 <--: < €n, >> e; = d} have the same size. Note: 
this exercise would be much easier if we knew that every element of S were 
integral over [, but this is a bit beyond our scope. 


Exercise 22.21. A curious student once observed that when we derived the 
quadratic formula in Example 22.32, we assumed that our quadratic poly- 
nomial was irreducible, and yet the quadratic formula, as derived in Example 
22.8, works for all quadratic polynomials over a field of characteristic different 
from 2; indeed, we say the quadratic formula, implying there is just one! Why 
should this be true, and does it generalize to cubics and quartics? We explore 
these questions in this exercise. 

Adopt the hypotheses of Proposition 22.25, with the additional assump- 
tions that C' is algebraically closed and of characteristic 0. Further, suppose 
that f is solvable by radicals over C’, which we now know is equivalent to say- 
ing n <4. Let g € C[{2] be monic of eae n, and write g = a" + ye ig ) aad 
with a; € C. Write F = Fo < Fi <--: < Fy = K where F;/F;_1 is Galois 
of order p;, a prime number, which is Sei since K/F is a solvable Galois 
extension. Let S; = SM Fj. 

(a) Prove that for all o € Gal(F;/F;-1), we have o($;) C Sj. 

(b) Prove that every element of 5S; is a C-linear combination of pin roots 
of elements of S;_1, with each radical in S;. Hint: Exercise 14.9 will be useful; 
look at the technique used in the proof of Proposition 22.22. 

(c) Write g = []j_.(x — 6;) with 6; ¢ C. Let r : S — C be the ring 
homomorphism over C' sending r; to 6;. Prove that t(a;) = aj. 


Exercises 285 


(d) Deduce that we can write a root of g using the “same” radical expres- 
sion that works for f, only substituting the coefficients of g for those of f. 
(Exercise 22.20 may help here.) 


Exercise 22.22. Derive a “cubic formula,” i.e., a formula for the roots of a 
general cubic polynomial involving radical expressions. You may suppose that 
the coefficients of the polynomial lie in a field whose characteristic does not 
divide 6. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


23 


Projects 


23.1 Gyrogroups 


Prerequisites. Chapter 3 at a minimum; it would be better to have gone 
through Chapter 9 to understand the notion of automorphism in the context 
of group theory. 


When we chose the axioms that define groups (or other algebraic struc- 
tures such as rings and vector spaces), what led us to these particular choices? 
What if we had chosen a different set of axioms? Experience has shown that the 
group axioms make a good choice because of the wide variety of applications 
(so many things turn out to be groups) combined with the deep consequences 
(groups have a lot of structure). But no one can stop us from selecting an- 
other set of axioms and naming a new kind of algebraic structure. Many such 
structures have been investigated, and in general each new kind of structure 
gets its own terminology and its own theory. 

In this project, we explore a type of structure called a gyrogroup which 
was proposed around 1992. Before we can give the definition of a gyrogroup, 
we need to discuss automorphisms of objects more general than groups. 


Definition 23.1. Let + be a binary operation on a set G. An automorphism 
of (G,+) is a bijective function a : G — G such that Vz,y € G,a(x+y) = 
a(x) + a(y). The set of all automorphisms of (G,+) is written Aut(G, +). 


Task 23.1.1. Suppose that a, 8 € Aut(G,+). Prove that a~t and 80a are 
also in Aut(G, +). 


We next present the original definition of a gyrogroup as stated by Ungar 
in [15]. We follow this work in using the notation + for the binary operation, 
even though this operation may not be commutative. We also agree to write 
the image of an element z under an automorphism a as az instead of a(z), 
to reduce the number of parentheses needed. 


Definition 23.2. A gyrogroup is a triple (G,+,g) where + is a binary oper- 
ation on Gand g : G? > Aut(G,+), satisfying the following axioms for all 
L,y,z2€EG: 

Gl.z+yeéEG 

Gla. r+ (y+z)=(x@+y)4+9(2, y)z 


DOT: 10.1201/9781003252139-23 287 


288 Projects 


G2b. (x+y) +z2=2+(yt+ g(y,2z)z) 

G3. c+ y= g(y,x)(y + 2) 

G4. (G,+) has an identity element, 0 

G5. x has an inverse element, written —x 
G6. g(0, y) =idg, the identity function on G 
G7. g(a + y,y) = g(x,y) 

G8. g(a, y)z = —(a@+ y) + (a@4+ (yt z)) 


Remark 23.3. The + operation of a gyrogroup is not required to be associative. 
Indeed, + is associative if g(x,y) is the identity function on G for all x,y, and 
we shall see that the converse also holds. 


When we define a new kind of structure, we usually want to give the fewest 
axioms possible to get the job done. In addition to saving space, this allows 
us to establish more easily that a given object satisfies these axioms. It is 
also pleasing to understand when some axioms follow logically from others 
and thus are redundant. In the case of gyrogroups, it turns out that some 
of the original axioms are redundant. To prove this, we will define a “weak 
gyrogroup” to be a gyrogroup with some of the axioms removed, and then you 
will prove that every weak gyrogroup is actually a gyrogroup. 


Definition 23.4. A weak gyrogroup is a triple (G,+,g) where + is a binary 
operation on G and g : G?® — Aut(G,+), satisfying Axioms G2a, G3, G4, 
G5, and G7. 


Notice that we removed Axioms G1, G2b, G6, and G8 from the list of 
gyrogroup axioms when we defined a weak gyrogroup. Thus every gyrogroup 
is also a weak gyrogroup, and our goal is to establish the converse. 

To start with, observe that Axiom G1 says that G is closed under +, which 
is guaranteed from the hypothesis that + is a binary operation on G. Thus, 
we automatically have: 


Lemma 23.5. Every weak gyrogroup satisfies Axiom G1. 


We can also guarantee the uniqueness of the 0 element based on Lemma 
3.33: 


Lemma 23.6. In a weak gyrogroup, the identity element is unique. 


A natural next question is whether inverse elements are unique in a weak 
gyrogroup. Looking at the corresponding result for groups, Lemma 3.34, we 
may be disappointed to see that the proof relies on the associative property, 
which we don’t expect to have in a weak gyrogroup. But we will not let this 
stop us yet. The key idea is to first establish a cancellation law: 


Task 23.1.2. Prove that the Left Cancellation Law is true in a weak gyrogroup: 
that is, prove 


Lemma 23.7 (Left Cancellation Law). Let G be a gyrogroup. Suppose that 
a,b,c€ Gandc+a=c+b. Thena=b. 


Gyrogroups 289 
Hint: Use Axiom G2a for appropriate choices of x, y, and z, together with 
the fact that automorphisms are bijective. 
We get the following corollary immediately: 
Corollary 23.8. In a weak gyrogroup, inverses are unique. More precisely, 


let G be a weak gyrogroup and leta€ G. Ifb,c€ G anda+b=0=a+e, 
then b=c. 


Now we are justified in using the notation —a for the inverse of a in a weak 
gyrogroup. The following task should also be straightforward now. 


Task 23.1.3. Let a € G, where G is a weak gyrogroup. Prove that —(—a) = a. 
We can further exploit Left Cancellation to get another Gyrogroup Axiom: 


Task 23.1.4. Let G be a weak gyrogroup. Prove Axiom G6 for G. Hint: Use 
Axiom G2a and Left Cancellation. 


From this, we get the following: 


Corollary 23.9. Let G be a weak gyrogroup, and let a € G. Then we have 
g(—a, a) = idg. 


Proof. We have g(—a,a) = g(—a+a,a) by Axiom G7. Now use Axiom G6 
(which is justified by Task 23.1.4) to get the desired conclusion. 


We are now in a position to begin to solve equations in a weak gyrogroup. 
To solve for x in the equation 


a+xz=5), 


it is natural to add —a on the left to both sides. The next task shows that 
this always works. 


Task 23.1.5. Let G be a weak gyrogroup, and let a,b € G. Prove that for all 
x € G, we have —a+ (a+ 2) =x. Conclude that the equation a+ x = b has 
the unique solution « = —a-+ b. Note that you must show that this choice of 
x does satisfy the given equation, and is the only possible solution. 


The fact that the + operation need not be associative can present chal- 
lenges to our notation as well as to our habits of thinking (since we have 
perhaps become used to all of our operations being associative). Recall the 
discussion of parenthesization from Chapter 3 beginning with Expression 3.2, 
and also in Exercise 3.10. To make our notation clearer, we will use the left- 
addition functions defined by elements of G, as in Cayley’s Theorem (Theorem 
10.9): 


Definition 23.10. Let G be a set with a binary operation +. For an element 
x € G, we define the left-addition function associated with x to be the function 
Tx : G-— G given by the formula 7,(y) = «2+ y for ye G. 


290 Projects 
With this notation, for example, we can write «+ (y+ z) as 
Tig Mal 2) (23.1) 


where the (unwritten) operation between 7, and 7, is composition of func- 
tions. Notice that («+ y) +z can be written as 7,+,(z), but not in general as 
T_Ty(z) unless + is associative. To practice using this notation, we start with 
an easy task. 


Task 23.1.6. Let G be a weak gyrogroup. Prove that 7) = ida, the identity 
function on G. 


In the context of group theory, the functions 7, were permutations of the 
underying set of the group. Next you will prove the same property for weak 
gyrogroups. 

Task 23.1.7. Let G be a weak gyrogroup. Prove that for all a € G, we have 
Tat—a = T-aTa = idg. (You should find the result of Task 23.1.5 helpful.) 
Conclude that each 7, is a permutation of the set G, and 7_, = te 

Now that we know the z functions are invertible, we can establish another 

gyrogroup axiom. 


Task 23.1.8. Let G be a weak gyrogroup. Show that for all x,y € G, we have 
TaTy = Te+yg(x,y) by using Axiom G2a. Solve for g(x,y) in this equation, 
and use the result to prove that Axiom G8 holds in G. 


We are also able to establish our earlier statement from Remark 23.3 about 
when the + operation is associative, even in a weak gyrogroup: 
Task 23.1.9. Let G be a weak gyrogroup. Prove that + is associative iff for 
all x,y € G, we have g(x,y) = ida. 

The following task helps to explain why we use additive notation in gy- 
rogroups. 


Task 23.1.10. Let G be a set with an associative binary operation +. Prove 
that the following are equivalent: 

(1) (G, +) is an abelian group; 

(2) (G,+,idg) is a gyrogroup; 

(3) (G,+,g) is a gyrogroup for some function g : G? + Aut(G). 


We will next establish a few identities to help prepare for the proof of 
Axiom G2b, which is the only axiom left. 


Task 23.1.11. Let G be a weak gyrogroup. 
(a) Prove that for all x,y € G, we have 


-1 el 
Trty)tyTety = TapyMe- 


(Axiom G7 and the formula for g(x,y) from Task 23.1.8 may be helpful.) 
(b) Use the result of part (a) above to prove that for all x,y € G, we have 


To+(at+b) = TTaTd, (23.2) 


Kaleidoscopes 291 


where a= —xandb=a2+y. 

(c) Prove that for all a,b € G, there exist x,y € G such that a = —x and 
b=a2+y. Conclude that Equation 23.2 is true for all a,b € G. 

(d) Evaluate Equation 23.2 at an element c € G to show that for all 
a,b,c € G, we have 


(b+ (a+ b))+c=b+(a+(b+0o)). (23.3) 


Task 23.1.12. Let G be a weak gyrogroup. 
(a) Prove that Axiom G2b holds in G if and only if for all 7,y € G, we 
have g(y, x)g(x, y) = idg. Hint: In Axiom G2a, we can replace z with g(y, x) z. 
(b) Use the previous results to prove that Axiom G2b does hold in G. 


Putting together what we have proved, we can now state: 


Proposition 23.11. Every weak gyrogroup is a gyrogroup. 


Remark 23.12. In [14], Sabinin noted that the original axioms proposed by 
Ungar in [15] contained redundancies. The techniques used in [14], however, 
will likely not be familiar to the reader without a background in an area of 
algebra known as loop theory; see [9] for this. Thus the present project is an 
attempt to accomplish the main goals of [14] without the use of loop theory. 


23.2 Kaleidoscopes 
Prerequisites. Chapter 5. 


Since the “Kaleidoscope Principle” is a theme of this book, we owe it to 
ourselves to understand how actual kaleidoscopes work. It’s all done with mir- 
rors! To start with, we will therefore study the physics and optics of mirrors, 
but with a very limited point of view, and from a mathematical framework. 

The setting is a 3-dimensional space R°. We assume that the following 
objects exist in our space: 

(1) The ground, which we assume is an infinite plane; for convenience, 
we take the ground to be the plane z = 0, i-e., the x, y-plane, which we write 
as R?. We may think of each point of the ground as having a color. 

(2) Rays of light. We assume that these are points (physicists call them 
photons) which travel in a straight line until they hit other objects. 

(3) A viewer. We assume the viewer is located at a fixed position above 
the ground. We really only care about one of the viewer’s eyes which we 
assume is observing the ground; this corresponds to the fact that a typical 
kaleidoscope only lets the viewer peek into the instrument with one eye at 
a time. We take the viewer’s eye position to be (0,0,h) for a fixed number 
h>0. 


292 Projects 


Angle of Incidence = a 
Oo B Angle of Reflection = 6 
- Law of Reflection: a = 6 


FIGURE 23.1: The Law of Reflection: side view 


ia 2 = eV 7 
a Angle of Incidence = ZAPQ =: a 


Q Angle of Reflection = ZBPR =: B 
Law of Reflection: a = 8 


y 


FIGURE 23.2: The Law of Reflection: perspective view 


(4) Mirrors. We assume that each mirror is of the form T x (0, h] where 
T C R?. That is, each mirror is the vertical extension of a 2-dimensional shape 
just above the ground up until eye-level. For now, we will restrict T to be a 
line segment, so the mirror is a vertical rectangle. 

The way that light behaves when it hits a mirror is described by the Law of 
Reflection, which says that the angle of incidence equals the angle of reflection 
(see Figure 23.1). More specifically, suppose that a photon hits a flat mirror 
at point P; let a be the angle made by the incoming light ray with the mirror; 
let A be a point on the incoming light ray (not equal to P), and let V be a 
point directly above P with respect to the mirror (that is, the line segment 
VP is perpendicular to the mirror). Then the outgoing light ray also makes 
angle a with the mirror, and lies in the plane containing the points A, V, and 
P (see Figure 23.2). 

Sometimes the effect of a mirror is described by saying that the viewer’s 
eye (or brain) is “fooled” into thinking that the view they see in the mirror 
represents an actual scene behind the mirror, a scene from a “mirror world,” 
because the brain expects light to travel in a straight line and is not pro- 
grammed to understand mirrors. But this idea of an actual scene existing 
behind the mirror is very useful (although we stop short of believing that, like 
Alice, one can step through the looking-glass). It is not just people’s brains 
that see this effect: if for example we put a camera in place of the viewer’s 
eye, then the camera would record the scene as if the mirror world existed; 
mathematically, it produces the same result. To show this is your first task: 


Kaleidoscopes 293 


ne , 
/ 
/ 
/ A Y 
/ 4 
& di 
/ y 
/ a 
T ae , 
/ CA 7 
/ 4 
, 
te 7 
/ 
/ y 
Y 
‘ a 
¢ 4 
7 
/ # 
/ 4 
ae 
‘y 
y > 


FIGURE 23.3: The Mirror-World Effect: single mirror; the region A is shaded 


Task 23.2.1. Suppose that our scene S contains just one mirror M = Tx (0, Al, 
where T is a line segment. Let A be the subset of R? which is behind the mirror 
with respect to the viewer: that is, the union of the rays from the origin to 
T minus the line segments from the origin to T; see Figure 23.3. Let @ be the 
line containing T, and let p : R? > R? be reflection about the line @. Define 


f : BR? > R? by 
es if P¢ A; 
1P= 1h if Pe A. 


Show that a light ray traveling on the line from the viewer’s eye toward a point 
P € R? will end up hitting the point f(P) on the ground. Conclude that the 
viewer will see exactly the same view if we remove the mirror M and replace 
the color of each point P € R? with the color of f(P). Note that light seen 
by the viewer must actually travel in the opposite direction, from the ground 
to the eye, but light paths are reversible! Also note that in Figure 23.3, the 
“snaky” horizontal line and the “42” are in the “real” world, directly viewed 
by the observer, while the vertical snake and the backwards vertical 42 are 
reflected images. In particular, note that the reflected snake ends abruptly at 
the boundaries of A; this corresponds to the fact that the function f is not 
continuous. 


To get more interesting effects, we can use more than one mirror at a time. 
With multiple mirrors, a light ray may be reflected (or “bounce”) multiple 
times on its journey from the ground to the viewer. This lets us see the real 
power of the mirror world concept. Note that our assumptions imply that 
for any finite number of mirrors, any given light ray will bounce only finitely 
many times between the ground and the viewer’s eye. 


Task 23.2.2. Suppose that the scene S contains any finite number of mirrors. 
Let M be the set of all mirrors M which can be seen directly by the viewer: 


294 Projects 


FIGURE 23.4: Standard kaleidoscope configuration 


that is, the line segments from the origin to the points on M intersect no other 
mirrors (in case of a partially blocked mirror, we can divide the mirror into 
two smaller mirrors and then use this definition). Define a function f as in 
Task 23.2.1 but using M instead of M. Show that the result of Task 23.2.1 is 
still true if in addition to replacing colors, we also replace mirrors according 
to the function f: so we place a mirror above point Q iff a mirror occurs above 
point f(Q). In other words, the mirror world can contain mirrors too! 


We are now ready to discuss kaleidoscopes. A standard kaleidoscope can 
be described within our framework as a group of 3 mirrors that form an 
equilateral triangle centered at the origin when viewed from above (see Figure 
23.4). In the figure, only the white area inside the triangle is directly viewed 
by the observer; the shaded area will consist entirely of reflections from this 
central area. To be definite, we take the equilateral triangle to have sides of 
length 1, so the height of the triangle is 3/2. By using the result of Task 
23.2.2 repeatedly, we can understand the visual effects that this configuration 
of mirrors will produce. 


Task 23.2.3. Show that the kaleidoscope shown on the left side of Figure 23.5 
is equivalent to the configuration shown on the right side of that figure. Note 
that the 3 triangular regions outside of the original triangle in the right-hand 
illustration represent the part of the final image produced by light rays which 
took exactly 1 bounce. The line segments at the boundary of the shaded 
region represent mirrors; reflected images have not been shown in the shaded 
region. The image “42” is for illustrative purposes only; your solution should 
be general, giving a function k that plays the role of the function f in Task 
23.2.1; here, you need only define k& in the unshaded region. 


Task 23.2.4. Continue Task 23.2.3 to define a function & on the region corre- 
sponding to 2 bounces; 3 bounces; and 4 bounces (by now you should get the 
big picture). Show that the final image corresponds to a partition of the plane 
into equilateral triangles, each of which is a (multiple) reflection of the origi- 
nal triangle; and that the number of bounces taken to reach a point P € R? 
is equal to the number of times that the line from the origin to P intersects 
the edges of these triangles (you may ignore points on these edges; technically 
we have a “tiling” of the plane instead of a partition, since the edges cause a 
slight problem). 


Kaleidoscopes 295 


An» ; 


FIGURE 23.5: Standard kaleidoscope image: with no bounces (left) and after 
one bounce (right) 


Task 23.2.5. If you have studied (pre)calculus, then show that the final func- 
tion k from Task 23.2.4 is continuous. This is one reason that the standard 
kaleidoscope uses the configuration that it does. However, show that a non- 
standard kaleidoscope made using 6 mirrors arranged in a regular hexagon 
centered at the origin results in a discontinuous k function; this has an un- 
pleasant visual effect. 


Task 23.2.6. Use the result of Task 23.2.4 to prove that the function o 
R? > R? given by the formula 


o(x,y) =(x# +1.5,y + V3/2) 


is a symmetry of the final kaleidoscope image with respect to R?. Then show 
the same of the function 


T(x, y) = (a@+1.5,y—- V3/2). 


Task 23.2.7. Let C = (—1/2,—-/3/6), the bottom-left corner of the original 
triangle. Define a new wu, v-coordinate system in the plane by translating C’ to 
the origin: that is, by setting u = «+1/2 and v = y+ V3/6. Prove that the 
dihedral group Dg is in the symmetry group of the kaleidoscope image in the 
u,v coordinate system. (This author has noticed that when actually looking 
through a kaleidoscope, the viewer’s eye seems drawn to a corner point such as 
C as a natural center instead of the origin, perhaps because of this symmetry!) 


Task 23.2.8. Adopt the u,v-coordinates of Task 23.2.7. Let a and £ be the 
functions o and 7 of Task 23.2.6 expressed in terms of u and v. Let H be the 
subgroup of Sym(R?) generated by a, 8, and Dg. Let X be the region of R? 
inside the kaleidoscope’s original triangle (including the triangle itself). Let 
Y ={f(a)| f € Handae X}. Prove that Y = R?. Conclude that if X is 
colored in any manner whatsoever, then the image in the kaleidoscope will 
include H in its symmetry group, and conversely, if a coloring of R? includes 


296 Projects 


H in its symmetry group, then the coloring is a kaleidoscope image for some 
choice of coloring of X. Compare to Exercise 5.9. 


Remark 23.13. In this project, we treated kaleidoscopes whose mirrors were 
arranged vertically, as in a prism. For a treatment of kaleidoscopes whose 
mirrors all meet at a single point (which we may take as the origin), see [6]. 


23.3. The Axiom of Choice 


Prerequisites. This project may be undertaken at any point, but to understand 
all of the applications presented, the reader should have completed through 
Chapter 17. 


This project explains an axiom at the foundations of mathematics, and 
applies it to algebra. Roughly speaking, we resort to the Axiom of Choice 
(or its equivalents) when we are dealing with things that are “too infinite” to 
realize explicitly. The subject matter in this project is “deep” in the sense that 
a complete understanding requires a knowledge of the standard axioms of set 
theory. Since we are primarily studying algebra and not set theory or logic, 
we will concentrate on the applications instead of on the basic questions of 
logical consistency and independence among axioms; in fact, we do not even 
attempt to give a list of standard set axioms. 

It turns out that there are several different statements which are logically 
equivalent to the Axiom of Choice (AoC for short), each of which has its 
own peculiar flavor and patterns of use. We shall begin with the statement 
known as Zorn’s Lemma, as this leads to the others fairly easily. We need 
some preliminary definitions. 


Definition 23.14. Let ~ be a relation on a set S' (in the sense of Definition 
8.3). We say that ~ is anti-symmetric if for all a,b € S,ifa~ band b~a, 
then b= a. 


We remind the reader that the notions of reflexive and transitive have been 
defined earlier (Definition 8.4). 


Definition 23.15. A partial order on aset S is a relation < which is reflexive, 
transitive, and anti-symmetric. A set with a partial order is called a partially 
ordered set, or poset for short. We will use the symbol < by default for a 
partial order. We also use the symbol < in the usual way, so a < b means 
a<bandaFb. 


Definition 23.16. A total order on a set S is a partial order < such that for 
all a,b € S, we have either a < b, or a= b, or b < a. (Note that by definition 
of partial order and <, at most one of these statements can be true!) 


The Axiom of Choice 297 


Task 23.3.1. Prove that the following relations are partial orders: 

1. The usual less-than-or-equal relation < on the set R; 

2. The subset relation C (also known as the inclusion relation) on the 
power set P(S') of a set S' (by definition, P(S) is the set of all subsets of 5); 

3. The restriction of any partial order on a set S to any subset T of S 
(note that technically, we must restrict to T?, not T itself). 


Definition 23.17. An element m of a poset S is called maximal if there exists 
no element a of S such that m < a. An element M € S is called a mazimum 
element of S if for alla € S we havea < M. 


Task 23.3.2. Prove that in any poset, there is at most one maximum element, 
and if M is a maximum then M is also maximal. Show by example that a 
poset may have any number of maximal elements. 


Definition 23.18. A chain is a totally ordered subset of a poset. More pre- 
cisely, if < is a partial order on S, then a chain in S' is a subset C' C S' such 
that the restriction of < to C is a total order. 


Definition 23.19. Let S be a poset, and let T C S. An element u of S' is 
called an upper bound for T if for every a € T we have a < u. (Note that this 
is different from saying that u is a maximum element of T’, since we may have 
ue S—T.) 


Task 23.3.3. Consider the set of all real numbers R with the usual partial 
order <. Find a subset T’ of R such that TJ has an upper bound in R, but 
there is no upper bound for T in T itself. 


Now we are ready to state Zorn’s Lemma. It is a matter of tradition (but 
somewhat unfortunate) that this statement has been labeled a “lemma” in- 
stead of an axiom. 


Axiom 23.20 (Zorn’s Lemma). Suppose that S is a non-empty partially 
ordered set, and that every chain in S has an upper bound in S. Then S 
contains a maximal element. 


Remark 23.21. We often apply Zorn’s Lemma to a set S whose elements are 
sets partially ordered by inclusion. In these situations, a typical strategy is to 
show that the union of a chain in S is also an element of S. Then this union 
provides the required upper bound for the chain. 


Task 23.3.4. Let R be a non-trivial commutative ring with 1. Use Zorn’s 
Lemma to prove that R has at least one maximal ideal, by following the steps 
below. 

1. Let S' be the set of all proper ideals of R, partially ordered by inclusion. 
Why is S non-empty? 

2. Let C be a chain in S. Prove that UreoI € S. (Compare Exercise 20.14, 
and note that here, our chain is not indexed by the natural numbers; in fact, 
our chain may be uncountably infinite!) 

3. Use Zorn’s Lemma to conclude that S' contains a maximal element M 
with respect to inclusion, which is the same thing as a maximal ideal of R. 


298 Projects 


Often, a property of interest requires the existence of finitely many objects 
which (together) satisfy specified conditions. The following task illustrates an 
important example of this phenomenon. 


Task 23.3.5. Let V be a vector space over a field F’, and let T C V. Prove 
that T is linearly independent over F if and only if every finite subset of T is 
linearly independent over F. 


The following task provides results which are useful in the situation above. 


Task 23.3.6. Let S' a set with a total ordering <, and let TC S. 

(a) Prove that T is totally ordered by the restriction of <. 

(b) Prove that if T is finite, then T has a maximum element; use induction 
on |T| (do not use Zorn’s Lemma here). 


From part (b) of Task 23.3.6, we can see the relevance of our earlier com- 
ment that AoC is most useful in infinite situations. 


Task 23.3.7. Let V be a vector space over a field F’. Use Zorn’s Lemma to 
prove the existence of a basis of V, as follows. 

1. Let S be the set of all linearly independent subsets of V, partially ordered 
by inclusion. Why is S non-empty? 

2. Let C be a chain in S. Prove that UrecT € S. 

3. Use Zorn’s Lemma and Exercise 13.7 to complete the proof. 


We state the result of Task 23.3.7 as a proposition: 


Proposition 23.22. Assuming the Axiom of Choice, every vector space has 
a basis. 


The reader may be wondering by now exactly what the Axiom of Choice 
itself says. The following is one common formulation of AoC; the reader may 
encounter slight variations in different sources, but these should all be logically 
equivalent to each other. 


Axiom 23.23 (Axiom of Choice (AoC)). Let S be a set whose elements are 
non-empty sets. Then there exists a function f with domain S such that for 
all x € S we have f(x) € x. 


The AoC thus lets us “choose” one element from every set in a given 
collection of sets, hence its name; we call the function f a choice function for 
S. One immediate consequence is the following: 


Proposition 23.24. Assuming the Axiom of Choice, for every subgroup H 
of a group G, there exists a complete set of left coset representatives for H in 


G. 


Proof. Let H < G be groups. Apply the Axiom of Choice to the set £ of all 
left cosets of H in G to get a function f : £— G with the property that for 
all C € £ we have f(C) € C. Then the image {f(C) : C € £} is a complete 
set of left coset representatives for H in G. 


The Axiom of Choice 299 


Next we sketch a proof relating Zorn’s Lemma to AoC. To accomplish 
such a proof on a firm footing, we should really work within the branch of 
mathematics known as axiomatic set theory, which is outside of our scope; we 
do note, for example, that one of the standard axioms of set theory guarantees 
the existence of the union of any two given sets, which is used in the proof 
below. For a good introduction to set theory based on the axiomatic approach, 
see [4]. 


Proposition 23.25. Zorn’s Lemma implies the Axiom of Choice. 


Proof. Let S be a set whose elements are non-empty sets. Let F be the set of 
all functions f whose domain D(f) is a subset of S, and such that f(a) € x 
for every x € D. Note that F is non-empty since we have #) € F. Partially 
order F by restriction: that is, f < g iff D(f) C D(g) and f = g|p:y). We 
leave it to the reader to verify that the union of any chain in ¥ is again in 
F (treating a function formally as a set of ordered pairs; see Definition 1.16). 
Therefore, Zorn’s Lemma gives a maximal element f € F. Let D = D(f) be 
the domain of f. Assume for a contradiction that D # S. Then since D C S, 
there exists A € S — D; and further, there exists a € A since A 4 0. Now let 
g = fU{(A,a)}, and notice that g € F and f < g, contradicting that f is 
maximal. This completes the proof. 


Remark 23.26. As indicated earlier, the converse of Proposition 23.25 is also 
true. 


We now return to an assertion made back on page 10. 


Proposition 23.27. Let C = (5;)icz be an indexed collection of non-empty 
sets, where the index set I is non-empty. Assuming the Axiom of Choice, there 
is a direct product P of C in the category of sets, and we have P# 9. 


Task 23.3.8. Prove Proposition 23.27. Suggestion: Let T; = {(i,a) : a € Sj}, 
and let P be the set of all choice functions for {T; : i € TZ}. 


Next, we guide the reader through the proof of yet another important 
result which depends on AoC. 


Definition 23.28. Let F bea field. An algebraic closure of F is an extension 
field K of F such that K is algebraic over F and K is algebraically closed. 


Task 23.3.9. Let F be a field. Use Zorn’s Lemma to prove that an algebraic 
closure of F exists. Suggestion: Let S be the set of all algebraic extension fields 
of F (technically, we could restrict the underlying sets of these extension fields 
to guarantee that S is not a “proper class,” but we shall ignore this). Partially 
order S' by the subfield relation, <. Show that the union of a chain in S is again 
in S. Prove by contradiction that any maximal element of S is algebraically 
closed. 

Task 23.3.10. Let F' be a field, and suppose that K is an algebraic closure of 
F’, Prove that there is no algebraically closed field LZ such that F< L < K. 
(You should not need AoC or its equivalents to do this.) 


300 Projects 


Remark 23.29. Although the Axiom of Choice is now widely accepted and 
usually passes without much remark, it does have consequences which are non- 
intuitive to some people. Perhaps the most notorious of these consequences 
is the so-called Banach- Tarski Paradox, which tells us that an ordinary, solid 
sphere in R° can be partitioned into finitely many subsets such that a finite 
sequence of spatial translations and rotations of these subsets can produce a 
solid sphere of twice the radius of the original. Although it has been termed 
a “paradox,” in fact this is a theorem which has been proved rigorously; the 
“trick” is that the proof uses AoC, and the subsets which are produced cannot 
have any well-defined volume individually! For a very accessible account of this 
result with plenty of background material included, see [16]. 

Aside from the Banach-Tarski Paradox, some people are not satisfied to 
use AoC because it produces a result “non-constructively”—that is, without 
providing a formula or algorithm for the choice function. To get a sense of 
this complaint, the reader may wish to try to find a complete set of coset 
representatives for (Q,+) in (R,+); such a thing is guaranteed to exist by 
AoC, but writing down an explicit description seems difficult. For a fairly 
thorough and mathematically detailed account of the history of AoC, see [12]. 


23.4 Some Category Theory 


Prerequisites. To complete this project, it is recommended to have finished 
through Chapter 21 of the text. The reader who has finished Chapter 13 
should be able to complete through Task 23.4.2. 


We have previously used the words category, object, and morphism without 
attempting to give precise definitions. In this project, we give a more formal 
treatment to these notions, and then give a corrected definition of isomorphism 
which applies to any category. 


Definition 23.30. A category is an ordered pair (Ob, Mor) satisfying the 
following axioms: 

(1) Ob is a collection of sets, called objects; 

(2) Mor is a collection of sets, called morphisms; further, for every A and 
B in Ob, there is a set Mor(A, B), called the set of morphisms from A to B, 
and Mor is the union of all these sets; 

(3) For every object A there is a distinguished element of Mor(A, A) called 
the identity morphism on A and written Id 4. 

(4) For every A, B, and C in Ob, and for every f € Mor(A, B) and every 
g € Mor(B,C), there is a morphism h € Mor(A, C) called the composition of 
g with f and written h = go f; 

(5) For every A and B in Ob, and for every f € Mor(A,B), we have 
fold, = f and Idgof =f; 


Some Category Theory 301 


(6) For every f € Mor(A, B), g € Mor(B,C), and h € Mor(C, D), we have 
(hog)o f =ho(go f) (morphisms are associative). 


Notation 23.31. Often we name our category. If we have a category C, then 
we may write Ob(C) for the collection of all objects in this category. We may 
also write Morc(A,B) instead of Mor(A, B) when we want to note which 
category a morphism comes from, and we write Mor(C) for the collection of 
all morphisms of the category C. 


Remark 23.32. Some care is needed to talk about categories correctly. For 
example, in the category of groups, G, the collection Ob(G) is the collection 
of all groups. It turns out that this collection is too big to be a set! Instead, it 
is what set theorists and logicians call a “proper class.” Similarly, Mor(G) is 
also a proper class. We can get into logical paradoxes if we are sloppy about 
these notions. 


Task 23.4.1. Describe each of the following using the language of categories, 
and verify that the Category Axioms are satisfied. You will be able to do most 
of the work by simply finding the appropriate results from earlier chapters of 
this text. (i) Groups; (ii) Rings; (iii) Vector Spaces. 

Remark 23.33. In all of our examples so far, a morphism from A to B has been 
a function from A to B, with certain special properties. But nothing in the 
Category Axioms requires this to be true in general; in principle, a morphism 
from A to B can be any set at all. However, in most categories we encounter, 
a morphism from A to B will at least include a function f : A> Basa 
component of the morphism, if not the entire morphism itself. In any case, we 
use the notation f : A— B to mean that f € Mor(A, B), even when f is 
not simply a function from A to B. 


Notice that the Category Axioms do not mention “isomorphism.” Instead, 
the concept of an isomorphism is defined from the basic ingredients of cate- 
gories given in the axioms. Until now, we have defined an isomorphism to be 
a bijective homomorphism (or a bijective linear transformation, in the case 
of vector spaces); but this is not a good definition of isomorphism in more 
general categories, especially when you remember that a morphism does not 
even need to be a function! The correct general definition of isomorphism is 
given below. 


Definition 23.34. Let C be a category. Let A,B be objects of C. An iso- 
morphism from A to B is a morphism f : A-— B such that there exists a 
morphism g : B—- A satisfying go f = Id, and fog = Idpz. 


Task 23.4.2. Prove that our previous definitions of “isomorphisms” (as bijec- 
tive morphisms) are equivalent to Definition 23.34 in the cateogories of groups, 
rings, and vector spaces. Again, previous results should be helpful. 


To introduce the reader to a new category-theoretic concept (called a direct 
limit) and simultaneously prepare the way for an example of a category in 


302 Projects 


which a bijective morphism is not the same thing as an isomorphism, we will 
work to understand fields of prime characteristic somewhat better. 

Let p be an ordinary prime integer. We have seen that F, := Z/pZ is 
a field, and that we can extend F, to get a finite field of any order which 
is a power of p, by taking a splitting field of the polynomial x* — x for an 
appropriate value of a. Furthermore, we saw that for each positive integer n, 
there is a unique field K, with |K,| = p”, up to isomorphism in the category 
of rings. 
Task 23.4.3. Find justifications in the text for all statements made in the pre- 
vious paragraph. Then use these statements to help show that for all positive 
integers m and n there is an embedding €m mn : Km Kmn- 


Looking at the previous task, it is tempting to ask whether all of these 
embeddings are “going somewhere”: is there a big field K in which all of the 
i, are embedded? If the maps €m,mn were simply inclusions, then we would 
want to take Kk to be the union of the K,,. Now an embedding is just as good 
as an inclusion, except that we need to identify elements of the embedded 
object with (differently-labeled) elements of the bigger object; so there ought 
to be a way of making something like a union out of these ingredients. Before 
we can accomplish this, however, we would like to ensure that the maps €m mn 
are consistent with each other. That is the purpose of the following task. 


Task 23.4.4. (a) Prove that the number of distinct embeddings of K,, into 
Kmn is equal to m, using Galois theory. 

(b) For each n € Zt, fix a generator w, for KX. Prove that there is a 
unique embedding Om mn : Km Kin such that Om mn(Wm) = wf,,, where 
q = (p™ — 1)/(p™ — 1). 

(c) Prove that the maps Om,mn are compatible in the sense that we have 
Ormemn ° OL,em = FL,emn for all 2,m,n € Zt. 


Definition 23.35. Use the notation of Task 23.4.4. Let 
S= LJ Kn, 
n=1 


and define a relation ~ on S by a~ biffa € Km, b € Kmn, and Ommn(a) = b 
for some m,n € ZT, or vice-versa (with the roles of a and b switched, so that 
~ is symmetric). The direct limit of the fields K,, with respect to the maps 
Om,mn is the set K of equivalence classes of S under ~. 


Task 23.4.5. Prove that ~ really is an equivalence relation on S$. Then prove 
that K is a field under appropriately defined addition and multiplication. 
Remember to establish that these operations are well-defined and that K is 
closed under these operations! 

Task 23.4.6. Prove that for every positive integer n, we have kK, © K as 
a field embedding; let L,, denote the image of K, in K. Then prove that 
K = Uc, L,. Use these results to show that Kk is infinite, and that every 
element of K is algebraic over the prime subfield of K. 


Some Category Theory 303 


Task 23.4.7. Prove that K is algebraically closed. 


Task 23.4.8. Define a function f : K — K by the formula f(a) = a?. Prove 
that f € Aut(K). 


As the reader has probably guessed, the concept of a direct limit can be 
defined in much greater generality. A definition in the category of rings is 
given next. 


Definition 23.36. Let Z be a partially ordered set (see Definition 23.15) 
such that every pair {7,7} C Z has an upper bound in 7, and let {R;}iez be a 
collection of rings indexed by Z. Suppose that whenever i, 7 € Z with zi < j, we 
have a ring homomorphism o;,; : R; + R;, and that these homomorphisms 
are compatible in the sense that whenever 7 < 7 < k we have 0j;,40°0i,; = Oi,r- 
Then a direct limit of the rings R; with respect to the maps o;,; is a ring R 
together with ring homomorphisms e; : R; — R such that for every ring 
T with ring homomorphisms 7 : R; —+ T compatible with the maps o;,;, 
there exists a ring homomorphism w : R— T making the following diagram 
commute: 


R 


T (23.4) 


Note: when we say that the 7; are compatible with the o;,;, we mean that the 
outer triangle in Diagram 23.4 commutes. 


Task 23.4.9. Verify that the field K constructed in Definition 23.35 is a direct 
limit of the rings K,, according to Definition 23.36, taking J = Zt, with the 
partial order on Z defined by divisibility in Z (instead of the usual < relation), 
and the maps Om, mn from Task 23.4.4. 


Task 23.4.10. Consider the category C whose objects are fields, and whose 
morphisms are ring homomorphisms given by polynomial functions from a field 
to itself. To be more precise: if A and B are two distinct fields, then we take 
Mor(A, B) = 0. We take Mor(A, A) to be the set of all ring homomorphisms 
f : A— A such that there exists a single polynomial g € A[z] satisfying 
f(a) = g(a) for all a € A. (This may seem artificial, but this kind of map is 
important in the area of math called algebraic geometry.) 

Let f and L, be as in Tasks 23.4.8 and 23.4.6, respectively. Prove f is 
a morphism in the category C, and that f is a bijective function. Also prove 


304 Projects 


that for every positive integer n, the restriction of f to L, is an isomorphism 
in the category C. But prove that f itself is not an isomorphism in C. Thus, in 
the category C, a bijective morphism is not the same thing as an isomorphism. 


For a somewhat more in-depth discussion of category theory that is still 
not too abstract, see [3, Appendix 5]. For a treatment of direct limits and 
more in this vein, see [10, Chapter III]. For one of the great classics in general 
category theory, see [11]. Also, note that the subject of homological algebra is 
intertwined with category theory: specifically, with a nice type called abelian 
categories. For good sources on homological algebra, see [17] or [13]. 


23.5 Linear Algebra: Change of Basis 


Prerequisites. Chapter 13 (including the exercises on linear transformations). 


Let V and W be finite-dimensional vector spaces over the same field F, 
with dimr(V) = n and dimr(W) = m. We saw in Exercises 13.15 through 
13.18 that there is a one-to-one correspondence between matrices in Mmn(F) 
and linear transformations 0 : V — W, if we first choose a basis B of V 
and a basis C' of W. In this project, we investigate exactly how the matrix 
representing o depends on the choice of bases. 

As noted in Exercise 13.15, to form the matrix of a linear transformation 
requires a knowledge of how our basis elements are to be ordered. Therefore, 
we give the following formal definition: 


Definition 23.37. Let V be a vector space of dimension n < oo over a field 
F. An ordered basis of V over F is a list (b1,...,6n) of elements of V such 
that the set {b,,...,b,} is a basis of V over F. 


Task 23.5.1. Prove that the elements listed in an ordered basis must be dis- 
tinct. 


We also formalize the idea that an ordered basis provides a coordinate 
system for a vector space: 


Definition 23.38. Let V be a vector space over F of dimension n < oo, and 
let B = (b,,...,b,) be an ordered basis of V over F. Let v € V. Then the 
coordinate vector of v with respect to B is the element c = (c1,...,Cn) € F", 
where c; are the unique elements of F’ such that we have v = ae C4 Dj. 


Next, we rephrase the results of the aforementioned exercises to emphasize 
a useful point of view. 
Task 23.5.2. Let V be a vector space over a field F' of finite dimension n. 

(a) Let B = (b1,...,6n) be an ordered basis of V over F’. Prove that B 
has the following property: For every choice of vectors v1,...,Un € V (with 


Linear Algebra: Change of Basis 305 


repetition allowed), there is a unique linear transformation 0 : V — V such 
that o(b;) = v; for all i. 

(b) Prove that a list B = (b1,...,b») with b; € V is an ordered basis of V 
over F if and only if B has the property stated in part (a) above. Compare 
this to the results in Example 7.12 and Exercise 14.10. The common theme 
in these three situations is the “freeness” of vector spaces, free groups, and 
polynomial rings. 


Our approach to studying the effect of basis choice on the matrix of a linear 
transformation is to take two choices of ordered basis for the same vector space 
and compare the resulting matrices. From this point on, we fix the following 
notation: 

Let o : V > W bea linear transformation, where dimr(V) = n and 
dimr(W) = m. Let B = (b),...,b,) and B’ = (b),...,b/,) be ordered bases 
of V over F. Let C = (c1,...,¢m) be an ordered basis of W over F. Let M 
be the matrix of o with respect to B and C, and let M’ be the matrix of o 
with respect to B’ and C. 


Task 23.5.3. Let M; be the i** column of M. Confirm that M; is the coordinate 
vector of o(b;) with respect to C. Make a similar statement for M/, the i‘ 
column of M’. 


Task 23.5.4. Let a : V — V be the unique linear transformation sending b; 
to b/. Prove that a is a vector space isomorphism. 


Next we further fix the following notation: 

Let a be the isomorphism of Task 23.5.4. Let A be the matrix of a with 
respect to B (recall that this means we choose B as the basis for V in both 
of its roles as domain and codomain of a). 


Task 23.5.5. Prove that A € GL,,(F). (Use the result of Task 23.5.4.) More 
precisely, show that A~! is the matrix of the linear transformation sending bj, 
to b;, with respect to the basis B. 


Task 23.5.6. Let v € V, and let c be the coordinate vector of v with respect 
to B. View c as a column vector, that is, as an element of M,,1(F'). Prove 
that Ac is the coordinate vector of v with respect to B’, viewed as a column 
vector. 


Task 23.5.7. Prove that M’. A= M, hence M’=M.- Aq}. 


Now that we have explored how a change of basis in the domain affects 
the matrix of a, let us see what happens when we change the basis of the 
codomain. To this end, let C’ = (cj,...,¢,) be another ordered basis of W 
over F. Let y : W > W be the unique linear transformation sending c; to 
c,, and let G be the matrix of y with respect to C. Let M” be the matrix of 
o with respect to B and C’. 


Task 23.5.8. Prove that M” =G-M. 


306 Projects 


Next, we allow both bases to change: 


Task 23.5.9. Let N be the matrix of o with respect to B’ and C’. Prove that 
N=G-M.-A"1. 

Now that we have understood the general change-of-basis formula, we turn 
to the important special case when W = V. 


Task 23.5.10. Suppose that W = V, C= B, and C’= B’. Thusa : VV, 
and the matrices representing o are square. Show that in this situation we 
have a = 7 and G = A, so that 


N=A-M-A“t. (23.5) 


Definition 23.39. Let V be a vector space over a field F’, with dimp(V) = 
n < oo. Let B and B’ be ordered bases of V over F’, and let A € M,,(F) be 
the matrix whose i‘ column is the coordinate vector of the it? element of B’ 
with respect to B. Then we call A the change-of-basis matrix from B to B’. 


Equation 23.5 is perhaps the most important change-of-basis formula. We 
recognize this formula as saying that N is the conjugate of M by A in the 
group GL, (F’) (see Definition 7.15). From Exercise 9.9, we know that N is 
the image of M under an automorphism, and hence we expect that N and 
M should “look the same” with respect to every aspect of the group GL, (F). 
This makes sense, since in fact N and M represent exactly the same linear 
transformation, only with respect to different choices of basis! What is more, 
our new knowledge from linear algebra sheds light on the meaning of conju- 
gation: to apply the conjugate of M by A is to first change coordinates from 
B’ to B (by applying A~'); then to apply our function o using the coordinate 
system of B (by multiplying by MM); and finally to change coordinates back 
from B to B’ (by applying A). The result, N, provides the way to apply the 
same function o using the coordinate system of B’ instead of B. 

Task 23.5.11. Let Q € GL,,(F) be any invertible n by n matrix over F’. Prove 


that Q is the change-of-basis matrix from B to B” for some basis B” of V 
over F’. 


Definition 23.40. Two matrices which are related by Equation 23.5 are called 
similar. We write MM ~ N to mean that there exists an invertible matrix A 
satisfying Equation 23.5. 


23.6 Linear Algebra: Determinants 
Prerequisites. Chapter 13 and Project 23.5. 


Recall that a matriz is a (finite) grid of elements which are called the entries 
of the matrix. For now, the entries will belong to a field, and we will only be 


Linear Algebra: Determinants 307 


FIGURE 23.6: Applying the linear transformation o to the unit square S (left) 
results in the parallelogram o(S) (right) 


concerned with square matrices. Let us fix a field F' and a finite-dimensional 
vector space V over F' with dimp(V) = n. We saw in Exercises 13.15 through 
13.18 that every linear transformation from V to V can be represented as an 
nxn matrix that depends on a choice of basis of V over F’. We saw in Project 
23.5 that changing the basis we use for V amounts to conjugating this matrix 
by an invertible matrix. 

Recall that F'” is a vector space over F' under componentwise opera- 
tions (see Example 13.4 and Exercise 13.12). By re-writing elements of F'” 
as columns instead of rows, we will view each column of a matrix M € M,(F) 
as a “column vector” in F’”. 

Our motivation in this project comes from the special case when F = R, 
the field of real numbers. Specifically, we will develop a theory motivated by 
the following question: by what factor does a linear transformation 0 : R” > 
R” change the area (or volume, or generalized volume) of a region in R”? 
Consider as an example the matrix 


2 

M= fe 5) © Mp(R). (23.6) 
Suppose that we use the basis B = (e1,e2) to interpret M as a linear 

transformation o, where we write elements of R? as column vectors, with 


sox btiasi(), 


Then we can read the value of o(e;) as the i*” column of M. Finally, by 


identifying the point (a,b) with the column vector 


(): 


we can display the effect of o on the square [0,1] x [0,1] C R?. The result is 
shown in Figure 23.6. 

To help with the next task, we recall that the area of a parallelogram is 
equal to its length multiplied by its height (see Figure 23.7). We can use any 
side of the parallelogram as the “length” side. 


308 Projects 


ye 


FIGURE 23.7: Area of parallelogram = L- H 


Task 23.6.1. Show that the area of the parallelogram o(S') in Figure 23.6 
is equal to 11, which is also equal to |M]| according to the formula given in 
Exercise 4.10. 


Task 23.6.2. Show that for any linear transformation o : R? > R?, the 
image of the unit square [0,1] x [0,1] will be a parallelogram, a line segment, 
or the origin. What can you say about when each case occurs? 


Our strategy will be to use three properties of area, suitably generalized 
to n-dimensional space, as axioms for something we shall call a “determi- 
nant.” Thus, a determinant is a generalized version of area. The importance 
of the concept of area suggests that determinants will be quite useful; indeed, 
determinants appear in critical places in both algebra and analysis. 


Definition 23.41. Let F be a field and let n € Zt. A determinant is a 
function det : M,,(F) > F which satisfies the following three axioms: 


1. det(Z) = 1, where J is the n x n identity matrix. 


2. If M is ann x n matrix, a € F, and N is the matrix obtained 
by replacing a column c of M by the column vector ac, then we 
have det(N) = a- det(M). 

3. If M is an n x n matrix, c; and c; are the i*” and j‘” columns of 
M, respectively, with i ~ 7, and N is the matrix obtained from M 
by replacing column ¢; by c; + ¢;, then we have det(N) = det(M). 


Task 23.6.3. Show geometrically that the Determinant Axioms are true for 
F =R and n = 2 if we replace det by the area of the parallelogram formed 
from the column vectors of a 2 x 2 real matrix, in the case a > 0. 


Remark 23.42. In Axiom 2, when F' = R, we may have a < 0. In light of Ax- 
iom 1, we see that a determinant can be negative. Sometimes the determinant 
is referred to as a “signed area” or “signed volume” because of this fact. It 
turns out to be better to allow negative real values for the determinant than 
to force the determinant to be positive all the time; for example, general fields 
do not have a notion of positive or negative! 


We shall see that there is a unique determinant function which satisfies 
our three axioms. To start with, you will establish a few consequences of the 
axioms, after we introduce the concept of a so-called “diagonal” matrix. 


Linear Algebra: Determinants 309 


Definition 23.43. An n x n square matrix M is called diagonal if Mi; = 0 
whenever i # j; here, as usual, M;,; denotes the entry in row i, column j of 
M. That is, a matrix is diagonal if its only non-zero entries occur on the main 
diagonal of M. We denote a diagonal matrix by diag(dj,d2,...,dn), where 
the d; are the diagonal entries, in order from upper-left to lower-right. 


Remark 23.44. Visually, a diagonal matrix looks as follows: 


d, O O =: O 
0 dg O ::- O 
0 O dz -:- O 
0 0 O ::: dy 


Diagonal matrices are the simplest types of matrices after scalar matrices. 
Task 23.6.4. Let det be a determinant function. Prove that det has the fol- 
lowing properties: 


4. If any column of the matrix M is zero (that is, its entries are 
all 0), then det(M) = 0. 

5. If M= diag(d1, da, seey dn), then det (M) = dy : dy seat ne ge dy. 

6. (a) If we replace the i*” column ¢; of M by the column vector 
cG +a-c¢,;, for any a € F and any column c,; with j i, then the 
determinant does not change. 

(b) More generally, if we replace c by ¢j + )0j4;0j cj for any 
elements a; € F’, then the determinant does not change. Suggestion: 
Use part (a) inductively. 

7. If we switch any two columns of a matrix, then the determinant 
is multiplied by —1. 


Next we will clarify a small but important issue about the definition of 
linear dependence. Originally, we defined linear dependence as a property of 
a set of vectors (Definition 13.22). But now, we wish to talk about whether 
the column vectors of a matrix are linearly dependent. If there are duplicate 
columns, then our present definition is not quite what we need. For example, 
if M € M2(F) has two non-zero but identical columns, then the set of columns 
of M is linearly independent, but we would like to say that the list of columns 
of M is linearly dependent. 


Definition 23.45. Let V be a vector space over a field F’, and let L = 
(v1,..-,Un) be an ordered list of elements of V. Then L is linearly dependent 
over F if there exist elements a1,...,@n € F, not all 0, such that >, aivj = 
0. Otherwise, L is called linearly independent over F. 


Task 23.6.5. (a) Prove that a list L = (v1,..., Un) of vectors is linearly depen- 
dent iff either the set {v1,...,Un} is linearly dependent, or the list contains 
duplicates, that is, there exist 1 ¢ j with uj = vj. 


310 Projects 


(b) Prove that if any sublist of a list L is linearly dependent, then so is L 
(define “sublist” appropriately to make this work!). 
Task 23.6.6. Prove that every determinant function has the following prop- 
erty: 


8. If the list of column vectors of M is linearly dependent over F, 
then we have det(M/) = 0. 


Now that the concept of linear dependence in F” has entered the picture, 
we should establish a basic fact for use later. 


Task 23.6.7. Let F be a field and let n € Zt. Let j be an integer such that 
1<j<n. Set W = {(a,...,@,) € F” | a; = 0 for 1 <i < j}. Prove that 
W < F” and dimp(W) =n-j. 

Task 23.6.8. Let M € M,(F) be a matrix with columns c,,...,¢,. Suppose 
that (c1,...,¢n) is linearly independent over F’. Show that, starting with M, 
we can use Properties 6(a) and 7 repeatedly to produce a diagonal matrix 
D = diag(di,...,d,,) where each diagonal entry d; is non-zero. Conclude that 
det(M) £ 0. 


In view of Determinant Property 8 and Task 23.6.8, we have the following 
result: 


Proposition 23.46. Let M € M,,(F). Then det(M) = 0 if and only if the 
list of columns of M is linearly dependent over F. 


Remark 23.47. The result of Proposition 23.46 is consistent with the inter- 
pretation of the determinant as a generalized area; for the determinant to be 
zero, the matrix columns must not span all of F”, but instead span a smaller- 
dimensional space. For example, in R? the “area” of a line segment or a point 
is 0, while the area of a true 2-dimensional parallelogram is non-zero. 


Definition 23.48. Let M ¢ M,(F). Then M is called singular if det(M) = 0; 
otherwise, M is called non-singular. 


Task 23.6.9. Let M € M,(F), and suppose that w : V — V is a linear 
transformation such that M is the matrix of w with respect to some basis of 
V over F. Prove that M is singular iff ker(c) ¥ 0. 

Next, we build an explicit formula for a determinant function. 


Task 23.6.10. Let M e€ M,(F) be a square matrix with column list 
(C1,---,€n). Let N be the matrix obtained from M by replacing column num- 
ber 7 by the column vector at ajcj;, Where a; € F’. Prove that we have 
det(N) = a; -det(M). 

Task 23.6.11. Prove that every determinant function has the following prop- 
erty: 


9. Let M be a square matrix whose i*® column can be written as 
asum, u+v. Let M’ and M” be the matrices obtained from M by 


Linear Algebra: Determinants 311 


replacing the i*® column by u and by », respectively. Then we have 
det(M) = det(M’) + det(M”). 
Task 23.6.12. Let M € M,(F) be a square matrix. Let F be the set of all 
functions f : [n] — [n], where [n] = {1,2,...,n}. For f € F, let My denote 
the matrix whose entry in row 7, column f(z) is the same as the corresponding 
entry in M, with all other entries zero. Use Determinant Property 9 repeatedly 
to prove the following formula: 


det(M = det( (My). 
fEF 


Task 23.6.13. Prove that, in the notation of Task 23.6.12, we have det(M,y) = 0 
if f is not bijective. Conclude that we have 


det(M Se det(M,). 
fESn 


We are not far from our goal of obtaining an explicit determinant formula. 
The next step is to convert the matrices which appear in Task 23.6.13 into 
diagonal matrices by repeatedly switching pairs of columns. To that end, we 
make the following definition. 


Definition 23.49. Let f € S,. Let E € M,(F) be the matrix all of whose 
entries are 1. Let s(f) be the minimum number of switches of column pairs 
needed to transform Ey into the identity matrix [,. Then the sign of f is 


sign(f) = (-1)). 


Task 23.6.14. Prove that for all f € S,, we have 0 < s(f) < n-—1. In 
particular, conclude that s(f) exists. 


Task 23.6.15. Prove that every determinant function satisfies the following 
formula: 


10. det(M aos sign(f) - My, ¢1) Mo, ¢(2) +> Mn, p(n), where as usual 


M;,; denotes fhe entry of M in row 2, column 7. 


Conclude that there is at most one determinant function for a given positive 
integer n and field F. 


To prove the existence of a determinant function, we must show that the 
formula in Determinant Property 10 satisfies the three Determinant Axioms. 
The main difficulty here is to show that Definition 23.49 is strong enough 
to handle any sequence of column switches that brings Ey to the identity 
matrix, not just a minimal such sequence. Until we know this, we will denote 
the function on the right-hand side of Determinant Property 10 by A. 


Task 23.6.16. Prove that the function UV : S, + GL,(F) given by the 
formula U(f) = Ey is an embedding of groups. 


312 Projects 


Task 23.6.17. Let f € Sp, and let N be the matrix obtained from Ey by 
switching column number a with column number b for some a ¥ b. Let h € Sp, 
be the permutation which switches a and b while leaving all other elements 
of [n] fixed. Prove that N = E, where g = fh. Note: such a permutation h is 
called a transposition; see also Exercises 22.12 and 22.13. 


Task 23.6.18. For a permutation f € S,,, define 


t(f) ={(,9) €[n]? : i<jand f(i) > f@)}I, 


the number of pairs which f puts “out of order.” Prove that if h is a transpo- 
sition in S,, then we have t(fh) = 1+t(f) (mod 2). Suggestion: Let a < b 
be the elements of [n] switched by h. Compare f to fh and show that, except 
for the pair (a, b) itself, the pairs which change from in-order to out-of-order 
or vice versa naturally come in pairs. 


Task 23.6.19. Let f € S,, and suppose that f can be written as a product of k 
transpositions. Use Task 23.6.18 to prove that we must have (—1)* = (—1)'). 
Conclude that, in particular, sign(f) = (—1)!. 


Task 23.6.20. Prove that A satisfies Determinant Property 7. 


Task 23.6.21. Prove that the function A defined by Determinant Property 
10 satsfies the three Determinant Axioms. Conclude that there is a unique 
determinant function. Suggestion: First show that A satisfies Determinant 
Axioms 1 and 2 as well as Determinant Property 9. Then use the previous 
results to establish the final Determinant Axiom 3. 


Finally, we will show that the formula from Exercise 4.10(a) generalizes 
to matrices of any size. Instead of trying to prove this result using explicit 
formulas, which would be messy, we will take advantage of the Determinant 
Axioms. 


Task 23.6.22. Let A,B € M,,(F). Use the steps below to prove the following 
property: 


11. det(A- B) = det(A) - det(B). 


(a) First suppose that det(A) = 0. Prove that det(A-B) = 0 too by using 
the correspondence between matrices and linear transformations. 

(b) Now suppose that det(A) = a # 0, so a € F%*. For any matrix 
M € M,(F), let 6(M) = det(A- M)- a7". Show that 6 satisfies the three 
determinant axioms. Conclude that 6 = det, hence det(A:B) = det(A)-det(B), 
as desired. 


At last, we can realize our original goal of measuring the factor by which 
(generalized) area increases under a linear transformation. The next task con- 
firms our intuition that this factor, the determinant, should be a property of 
a linear transformation itself, independent of any choice of basis. 


Task 23.6.23. Let V be a vector space over a field F', and leto : VV be 
a linear transformation. Use the results of Project 23.5 and Task 23.6.22 to 


Linear Algebra: Eigenvalues 313 


prove that the determinant of a matrix of o does not depend on which basis 
of V we choose. Thus, we can define det(o) without reference to a basis. 


We next present another remarkable property of determinants. First we 
introduce a new matrix operation that switches the rows and columns of a 
matrix. 


Definition 23.50. Let M € Myn»(F). The transpose of M is the matrix 
N € Mn m(F) such that for all i,7 we have Nj; = Mj,;. We write N = M7. 
We say that M is symmetric if we have M7 = M. 


Task 23.6.24. Use Determinant Property 10 to prove that for any square 
matrix M we have det(M7) = det(M). 


We saw in Determinant Property 5 that the determinant of a diagonal 
matrix is easy to compute: it is just the product of the diagonal entries. 
Now that we understand determinants better, we can generalize that result to 
matrices of a somewhat less special form which is often attainable in practice. 


Definition 23.51. Let M € M,(F). Then M is called an upper triangular 
matrix if we have M;,; = 0 whenever 7 > 7, and M is called lower triangular 
if M;,; = 0 whenever 7 < j. We say that M is triangular if M is either upper 
triangular or lower triangular. 


Task 23.6.25. Prove that the determinant of a triangular matrix is equal to 
the product of its diagonal entries. 


23.7 Linear Algebra: Eigenvalues 
Prerequisites. Chapter 13 and Project 23.6. 


“Since we lack the time to carry out this experiment in practice, Eigenfac- 
tor uses mathematics to simulate this process.” [2] 

Let V be a finite-dimensional vector space over a field F’, with dimr(V) = 
n > 1. In this project we again focus on linear transformations 0 : V — V 
from the vector space to itself. We know that there can be many different 
matrix representations of the same linear transformation a0, depending on our 
choice of a basis of V over F’. Our goal is to find the “best” basis to get the 
simplest possible matrix of o. 

We have seen that the simplest types of square matrices are, in order, 
the identity matrix; scalar matrices; and diagonal matrices. Now we know 
that if the matrix of a linear transformation with respect to some basis is a 
scalar matrix, then the matrix with respect to every basis will be that same 
scalar matrix (see Exercise 13.18). Thus we cannot hope to simplify the matrix 
form of a linear transformation from a non-scalar matrix to a scalar matrix. 


314 Projects 


Therefore, we will ask for the next simplest thing: given a, is there a basis of 
V over F for which the matrix of o is diagonal? 

What would it mean to say that the matrix M of o with respect to the 
basis B = {b1,...,bn} is the diagonal matrix diag(d1,...,d,)? Well, this 
means that 


for all i € {1,2,...,n}. In other words, the effect of o on each basis vector is 
simply to multiply that vector by a certain scalar. With this motivation, we 
make the following definitions. 


Definition 23.52. An eigenvalue of a linear transformation 0 : V — V is 
an element  € F' such that we have o(v) = Av for some vector vu € V — {0}. 
We call such a vector v an eigenvector of o. 


Definition 23.53. A linear transformation 0 : V — V is called diagonaliz- 
able if there is a basis B of V over F' such that the matrix of a0 with respect 
to B is diagonal. 


Remark 23.54. Note the requirement in Definition 23.52 that v 4 0. Indeed, 
our motivating situation required v to be an element of a basis of V, so v can 
only be the zero vector if V = {0}, which is forbidden by our assumption that 
dim-(V) > 1. If we allowed v = 0 in this definition, then every scalar would 
be an eigenvalue, since we must have o(0) = 0 = \-0 for all A € F. On the 
other hand, we do allow \ to be 0, and this is an important special case. 


Remark 23.55. The words “eigenvalue” and “eigenvector” come from German, 
where eigen is a root used to denote ownership or belonging. The sense of the 
word “eigenvalue” is a value which is closely associated with, or characteristic 
of, the given linear transformation. 


Task 23.7.1. Let ¢ : V — V bea linear transformation. Prove that o is 
diagonalizable if and only if there is a basis of V over F' which consists of 
eigenvectors of o. 


Definition 23.56. A basis consisting of eigenvectors of a given linear trans- 
formation is called an eigenbasis. 


As we have come to expect, important concepts in a given algebraic cat- 
egory are often compatible with the natural structures of that category. The 
next task gives one example of this phenomenon, and shows that we have not 
yet exhausted the potential applications of the “eigen” prefix! 


Task 23.7.2. Let V bea vector space over a field F. Leta : V > V bea linear 
transformation, and let A € F. Let V. = {Ov} U {a € V | o(a) = Xa}. Prove 
that V, < V. We call Vy the eigenspace of V corresponding to the eigenvalue 
A (with respect to 0). Note that V) is defined even if X is not actually an 
eigenvalue of o, and in this case we have Vy = {Ov}. 


Linear Algebra: Eigenvalues 315 


Until now, we have viewed a matrix as a particular, concrete way to rep- 
resent a given linear transformation. But if we instead start with a matrix 
M € M,(F) without any other information, then there is a natural way to 
construct a linear transformation from M. We need a definition before ex- 
plaining how to do so. 


Definition 23.57. Let V be the vector space formed from the set F” using 
componentwise operations. The standard basis of V is the set {e1,...,en} 
where the i** component of e; is 1 and the other components are 0. 


Task 23.7.3. Verify that a “standard basis” really is a basis. 


Task 23.7.4. Let V = F” under componentwise operations; let B be the 
standard basis of V; and take o(a) = Ma for a € V, where we view elements 
of V as column vectors. Prove that in this situation, 0 : V — V is a linear 
transformation, and M is the matrix of o with respect to the basis B. Thus we 
may speak of the eigenvalues, eigenvectors, diagonalizability, etc. of a square 
matrix. 


Definition 23.58. If 0 : F” —> F"” is a linear transformation, then the 
matrix of o with respect to the standard basis of F” is called the standard 


matrix of a. 
3. -2 
iS ae 


Show that M is diagonalizable when viewed as an element of M2(R), but not 
when viewed as an element of M2(Q). 


Task 23.7.5. Let 


Task 23.7.5 illustrates that a given matrix may become diagonalizable if we 
extend the base field. But there are some matrices which are not diagonalizable 
no matter how much we extend the base field, as you will see in the following 


task. 
Task 23.7.6. Let ae 
u=(? ae 
Show that M is not diagonalizable over any extension field of Q. 
Computing the eigenvalues of a given square matrix without a guiding 
theory may quickly become a nightmare of simultaneous equation-solving. To 


discover a good point of view, we first use a result from the Determinants 
project. 


Task 23.7.7. Let M € M,,(F). Use the result of Task 23.6.9 to prove that 0 is 
an eigenvalue of M iff det(M) = 0. 


After this modest beginning, we need two more ideas before we will arrive 
at the desired insight into eigenvalues. The first idea allows us to generalize 
the result of Task 23.7.7 from 0 to any eigenvalue A, by manipulating the 
equation Mv = Av. 


316 Projects 


Task 23.7.8. Let M € M,(F) and let \ € F. Use basic matrix algebra together 
with the result of Task 23.7.7 to prove that A is an eigenvalue of M iff det(AI— 
M) = 0. 

Now we only need one more idea to reach the great insight we are seeking. 
To fully realize the potential of the result of Task 23.7.8, our idea is to replace 
the field element with a variable x. From the explicit determinant formula 
in Determinant Property 10, we see that det(zI — M) is a polynomial in Fa]; 
thus, the roots of this polynomial should be precisely the eigenvalues of the 
matriz M.'The only trouble with this reasoning is that our theory of matrices 
has only allowed the entries of a matrix to come from a field, not from a more 
general ring such as a polynomial ring. One way to fix this problem is to use 
the field of rational functions F(x) as our starting point instead of F itself. 
(In fact, there is a natural theory of matrices over any commutative ring with 


1.) 
Task 23.7.9. Let F be a field and n € Zt. Let K be any extension field of F. 
Show that M,,(F’) < M,(K) (as rings). 


Definition 23.59. Let M ¢ M,,(F), where F is a field. View F' as a subring 
of F(a), the field of rational functions over F’. The characteristic polynomial 
of M is the element yas(x) = det(aI — M) € F(x). 


Task 23.7.10. Prove that we have x,7(x) € F[z], justifying the use of the word 
“polynomial” in the definition of ya¢(x). More specifically, prove that x .¢() 
is monic and of degree n. 
Task 23.7.11. Let M € M,,(F). Prove that the eigenvalues of M in F' are 
precisely the roots of y.4(x) in F’. Note that some authors are willing to allow 
eigenvalues to come from any extension field L of F without mentioning LD. 
Bundled up in its characteristic polynomial are many of the key secrets of 
a matrix. In many cases, knowing yy() is enough to know that M is diag- 
onalizable; understanding this is our next goal. We start with the result that 
eigenvectors corresponding to different eigenvalues are linearly independent. 


Task 23.7.12. Let M € M,,(F). Let \1,...,A% be & distinct elements of F', and 


suppose that for all i € {1,...,k}, vu; is an eigenvector of M with eigenvalue 
dj. Prove that (v1,...,v%) is linearly independent over F’. Suggestion: Induct 
on k. 


Task 23.7.13. Let M € M,,(F), and suppose that y.7(x) has n distinct roots 
in F. Prove that M is diagonalizable over F’. 


Task 23.7.14. Prove that similar matrices (see Definition 23.40) have the same 
characteristic polynomials. 


Task 23.7.15. Suppose that F' is an algebraically closed field, and let M € 
M2(F). Prove that M is similar to a lower triangular matrix, 


Ty 1 0 
Mn~L= , . 
& a) 


Linear Algebra: Eigenvalues 317 


Show that we have yy(x) = (a— £1,1)(a—L2,2), and thus that M is diagonal- 
izable if £11 4 Lo. Show that the set T of all 2 by 2 lower triangular matrices 
over F’ is a vector subspace of M2(F) of dimension 3 over F’. Conclude that 
the set of all non-diagonalizable matrices in T is a subset of a 2-dimensional 
subspace of 7’, and is thus “small” compared to T itself. 


We conclude this project with some applications of eigenvalues. 


Task 23.7.16. Define a sequence a1, a@2,... by a, = a2 = 1 and aj42 = aj41 + 
a; for all j € Zt. Find a matrix M € M2(R) such that 


M - (aj,4541)” = (aj41,4542)" 


for all j € Z*. Find the eigenvalues of M; you should find that they are two 
distinct real numbers A, and Ag where we can write 0 < Ay < 1 < Ag. Deduce 
that we can write M ~ diag(A;,A2), hence that there exist real constants 
C1, ¢2 such that for all 7 € Z* we have a; = c,\j +c2A3. Use the known values 
of a1, G2, Ay, and Az to solve for cy and cg. Since 0 < A; < 1, note that we 
have a; * c2A3 for large values of j; evaluate both sides for a few values of 
j => 3 and compare. This illustrates how eigenvalues, especially the largest 
ones, control the long-term behavior of linear systems. Note that the sequence 
a1,@2,... is the famous Fibonacci sequence. 


Task 23.7.17. As another example of how eigenvalues and eigenvectors occur in 
practice, consider a space divided into n regions R, through R,,. Suppose that 
at any given time, each region R; contains a certain quantity q; of something: 
perhaps q; measures the number of people in region R;, in millions. We make 
the following assumptions: 

(1) time is divided into intervals of a fixed length; 

(2) in any single time interval, the population in region R; will move to 
the other regions (including region R; itself) in fixed proportions; and 

(3) the “universe” U = U'_, R; forms a closed system: that is, no one can 
move into U from outside of U, or vice versa. 

For example, it could be that 23.1% of the people in region R; move from 
region R, to region Ry, in each time interval. In general, let M;,; be the fraction 
of people moving from region R,; to region R,; in each time interval. Let M 
be the n x n matrix whose entries are the numbers M;,;. Note that we should 
have )>;., Mj,; = 1 for each 7 € {1,...,n}. Use this fact to prove that 1 is 
an eigenvalue of M. Then prove that 1 is the only possible real eigenvalue of 
M whose eigenvectors have non-negative entries, as our assumptions require. 
Note that an eigenvector with eigenvalue 1 corresponds to a “steady-state” 
distribution of people throughout the n regions, that is, a distribution which 
will not change over time. Find such an eigenvector for the particular matrix 


0.3 0.25 0.4 
M= {05 0.3 O.1 
0.2 0.45 0.5 


318 Projects 


We note that the type of model described in this task was the starting point 
for Larry Page and Sergey Brin’s PageRank® algorithm, which was the foun- 
dation of the Google search engine when it was launched commercially in 1998; 
in this case, the regions represent web pages. 

In case there is still any doubt whether the material in this project is im- 
portant, we note that eigenvectors appear in the list of axioms of quantum 
mechanics, one of the fundamental theories of physics that describes the na- 
ture of matter. Namely, the state of an object is described by a vector; each 
measurable property p of the object has an associated linear transformation 
tp; and when an observation of the value of p is made, the state of the object 
instantly changes to become an eigenvector of t, whose eigenvalue is precisely 
the measured value of p. Just before the measurement is made, the state of 
the object can be a more general vector, a linear combination of eigenvec- 
tors, which indicates that the object does not have a definite single value for 
property p until we measure it—one of the weird facts of quantum physics! 

Many other applications of eigenvalues exist, including one referenced in 
the quotation at the start of this project, namely, a ranking algorithm for 
academic journals. In this situation, published articles form the “regions” and 
citations form the “movement” across regions. The goal is to measure the 
impact that any given article makes within its own subject area. 


23.8 Linear Algebra: Rotations 


Prerequisites. Chapter 13 and Project 23.6. 


For which collections of vectors in R” is it possible to rotate the vectors 
about the origin simultaneously so that they all have non-negative coordi- 
nates? To be specific, let us use the standard basis for R” over R, and repre- 
sent a vector as a column of n real numbers with respect to this basis. In order 
to work on this question, we first need to agree what we mean by a “rotation.” 
At least in dimensions n < 3, the reader is familiar with the geometric idea of 
rotation about a point or about an axis; we will not discuss these notions for 
now, except to state that we would like our rotations to send the origin of R” 
to itself. Since a rotation would seem to involve “curving” as a basic necessity, 
some readers may find it slightly disconcerting that every rotation which fixes 
the origin should be a linear transformation; but the mystery clears up when 
we realize that, after all, a rotation should send lines to lines. 

We would like to define a “rotation” to be a linear transformation from 
R” to itself which preserves distances and angles; this leaves us to find an 
algebraic way to measure both of these quantities. To build our intuition, 
let’s start with rotations in R?, where we already have some experience from 
the discussion of rotational symmetries in Chapter 5. We will consider the 
word “rotation” to be somewhat informal, and later replace it with a suitable 
general definition and a new term. 


Linear Algebra: Rotations 319 


Task 23.8.1. Let a be a real number. Verify that Equation 5.1 represents the 
counterclockwise rotation R =: Ra about the origin by a radians, even if a 
does not have the special form 27/n where n € Zt. Then prove that Ry is a 
linear transformation from R? to R?, and find the standard matrix of Ra. 

Next, we will investigate linear transformations which preserve distances. 
We recall that the distance from (0,0) to (v1, v2) in R? is \/v? + v3. If the 
reader has not studied distance formulas in higher dimensions, then it may 
be surprising to see how the two-dimensional distance formula generalizes: 
instead of taking a sum of cubes followed by a cube root in three dimensions, 
we just keep adding squares of the components and taking the square root. 
Task 23.8.2. Show geometrically that the distance from (0,0,0) to (v1, v2, v3) 
in R® is equal to \/vj + v5 + v3. Suggestion: Construct a right triangle in 
the x, y-plane with vertices (0,0,0), (v1,0,0), and (v1, v2,0) and another right 
triangle perpendicular to the x, y-plane with vertices (0,0,0), (v1, v2,0), and 
(v1, U2, U3). 

The following definition captures the general distance formula using the 
notion of length; the idea is that the length of a vector is equal to the distance 
from the tail to the head. Also note that since we are viewing our vectors as 
columns, the transpose of a vector is a row; this is more convenient to fit on 
the page. 


Definition 23.60. Let v = (v1,...,Un)’ € R”. The length of v is the real 
number 


ju] = ort ob te tor, (23.8) 
The distance between v and another vector w € R” is the real number |v — w]. 


Now we can start to be more precise about the desired properties of a 
general rotation. 


Definition 23.61. Let t : R” — R” be a linear transformation. We say 
that t preserves lengths if for all v € R”, we have |t(v)| = |v|. We say that t 
preserves distances if for all v,w € R”, we have |t(v — w)| = |v — wI. 


Task 23.8.3. Show that a linear transformation from R” to itself preserves 
distances iff it preserves lengths. 

Even though Task 23.8.3 suggests that we should focus on length- 
preserving transformations, since their definition only involves one arbitrary 
choice of vector instead of two, it turns out that there is useful information 
to be learned by expanding the formula for distances: 


Task 23.8.4. Let v = (v1,...,Un),w = (w1,.-.,Wn) € R”. Show that we have 
|v — wl? = |u|? + wl? — 2p(v, w) (23.9) 


where p(v,w) = >;_, uiwi. Deduce that if R : R” — R” is a linear 
transformation that preserves lengths, then for all v,w € R” we must have 
p( Rv, Rw) = p(v,w). Also note that we have p(v,w) = v?w = wv, inter- 


preting a 1 x 1 matrix as a real number. 


320 Projects 


The function which we labeled p in Task 23.8.4 above is evidently im- 
portant: it must be preserved by any linear transformation which preserves 
distances or lengths. Therefore we will study the properties of p more closely. 
We begin by assigning p its traditional (though unimaginative) name; also, 
we generalize from R to an arbitrary field since the formula for p still makes 
sense there. 


Definition 23.62. Let F' be a field. The dot product on F” is the function 
PE 


-: F" x F” > F given by the formula v-w = w* v. 
Remark 23.63. The dot product is also known as the scalar product, since its 
range consists of scalars, i.e., field elements. 


Task 23.8.5. Let F be a field, and let - be the dot product on F”. Prove that 
for all u,v, w € F” and all c € F, the following properties hold: 

IPl.u-(v+w)=u-v+u-wand (ut+v)-w=u-wtvu-w. 

IP2. c(u-v) = (cu): v=u- (cv). 

IP3.u-v=v-u. 

The reader may be wondering why we used the prefix “IP” in naming the 
properties of the dot product. This is because we will use the same properties 
as axioms to define a more general type of function called an inner product. 


Definition 23.64. Let V be a vector space over a field F’. An inner product 
on V isa function: : V? > F satisfying properties IP1 and IP2. An inner 
product is called symmetric if it also satisfies IP3. 


Task 23.8.6. Let V be an n-dimensional vector space over a field F, with 
ordered basis B = (b1,..., bn). 

(a) Suppose that - is an inner product on V. Let A € M,,(F’) be the 
matrix whose entry in row 7 and column 7 is a;,; = bj - 6;. Prove that for all 
v,w € V, we have v-w = cf Acw, where c, and c, denote the coordinate 
vectors representing v and w with respect to B (as column vectors). Further, 
prove that - is symmetric iff A is symmetric. 

(b) Conversely, prove that if A € M,,(F), then the formula v-w = c? Acy 
defines an inner product on V, which is symmetric iff A is symmetric. Conclude 
that there is a bijective correspondence between inner products on V and nxn 
matrices over F’, with symmetric inner products corresponding to symmetric 
matrices. 

(c) Show that the matrix corresponding to the dot product on F” with 
the standard basis is the identity matrix. 


Definition 23.65. Let V be an n-dimensional vector space over a field F, 
with ordered basis B = (bi,...,b,). Let - be an inner product on V. Then 
the matriz of - with respect to B is the matrix A € M,(F) given by the 
correspondence in Task 23.8.6. 


Next we single out one additional property of the dot product, which allows 
us to recover lengths and distances: 


Linear Algebra: Rotations 321 


Task 23.8.7. Show that for all v € R”, we have |v| = \/v- v, where - is the dot 
product. Conclude that we can express the distance between any two vectors 
of R” in terms of the dot product. 


In exploring the consequences of preserving distances, we have not yet 
considered what it means to preserve angles. Luckily, the work we already 
did allows us to show that preserving angles is a consequence of preserving 
distances. Specifically, let us consider Equation 23.9. This equation reminds 
us of the Law of Cosines in trigonometry (see Figure 23.8). If we identify O 


FIGURE 23.8: The Law of Cosines: |c|” = |a|? + |b|? — 2 |a| || cos(@) 


as the origin, and draw vector v from O to P and vector w from O to Q, 
then v — w is the vector from Q to P, and we have |v| = |a|, |w| = |b], and 
|v — w| = |c|. Thus everything fits perfectly, suggesting that we should have 
v-w = |v||w|cos(@), where 0 is the angle between v and w. Since geometry 
is outside of our scope, instead of accepting this equation at face value, we 
would like to use it to define the angle between two vectors: note that the 
remaining quantities can be computed using algebra! But to be careful, since 
the range of the cosine function is the interval [—1,1], we should first show 
that the ratio v-w/|v| |w| always lies in this interval whenever |v| and |w]| are 
not zero. That is, we would like to prove the following theorem: 


Theorem 23.66. Let v,w € R"—{0}, wheren € Z*. Let - be the dot product 
on R”. Then we have 


— |v] lw] < v-w < Jol lu, (23.10) 
with equality on the left iff w = —v, and equality on the right iff w = v. 
First we reduce to the case when both vectors have length 1. 


Definition 23.67. A unit vector is a vector in R” of length 1. The set of all 
unit vectors in R” will be denoted UV(n). 


Task 23.8.8. Prove that Theorem 23.66 is true if and only if it is satisfied for 
all unit vectors v and w. 


Next we examine the form of unit vectors. 


322 Projects 


Task 23.8.9. (a) Show that UV(1) = {1,—1}, if we identify R! with R. 

(b) Show that UV(2) = {(cos(@), sin(@)) : 0 < 6 < 27}. Further, show that 
the function f : [0,27) + UV(2) given by the formula # + (cos(6), sin(@)) is 
bijective. 

(c) Prove by induction that for all n > 2, UV, is the set of all u € R” of 
the form 


u = (cos(6,), sin(@) cos(02), sin(0,) sin(@2) cos(03),...,sin(@,) ---sin(@,_1)) 


where 6; € [0,27); that is, uw = (wi, U2,...,Un) where wu; = cos(61), ug = 
sin(@,) cos(@2), and in general, uj; = sin(@) sin(62) ---sin(@;-1) cos(@;) if 1 < 
j <n, with final component u, = sin(0,) sin(02)---sin(@,_1). Note that we 
no longer have uniqueness of the angles 6; when n > 2. 
Task 23.8.10. Prove Theorem 23.66 by induction on the dimension n. You 
will only need basic trigonometric identities together with the results of the 
previous tasks; you may also find useful the fact the a real-valued linear func- 
tion whose domain is a closed interval must have its extreme values at the 
endpoints of the domain. 

We are now in a position to measure the angle between two non-zero 
vectors in R”. 


Definition 23.68. Let v,w € R” — {0}. The angle between v and w is the 
unique real number 6 in the interval [0,7] such that cos(@) = (v- w)/(|v| |w), 
where - is the dot product. 


A special case of interest is when two vectors meet at right angles: 


Definition 23.69. Let v,w € R”. We say that v and w are orthogonal (to 
each other) if v- w = 0. 


Remark 23.70. When v and w are non-zero, then to say that v and w are 
orthogonal means that the angle between them is 7/2. But by our definition, 
the zero vector is orthogonal to every vector, even though angles are not 
defined in this case. 


The following definition is now amply motivated: 


Definition 23.71. Let t : R” — R” be a linear transformation. We say 
that t preserves angles if t is injective and for all v,w € R” — {0}, we have 


t(v) - t(w) _ uw 
le(v)| few) Jol few’ 


(23.11) 


where - is the dot product. 


Task 23.8.11. Let t : R” — R” be a linear transformation. Prove that if t 
preserves distances, then t preserves angles. 


Linear Algebra: Rotations 323 


Now we have established the following result: 


Proposition 23.72. Lett : R” — R” be a linear transformation. If t 
preserves lengths, then t preserves distances and angles. 


As promised, now that we are in a position to formally define the idea of 
a “rotation,” we will introduce a new word: 


Definition 23.73. A linear transformation t :R” —> R” is called an orthog- 
onal transformation (of R”) if t preserves lengths. 


Proposition 23.72 is stated in plain language, at what we might call a 
“conceptual” level, but we have also built enough theory to reformulate this 
result in a single, elegant equation. 


Task 23.8.12. Let ¢ be an orthogonal transformation of R”. Let B = 
(€1,.--,€n) be the ordered standard basis of R”, and let - be the dot product. 
(a) Verify that for all 7,7 € {1,...,n}, we have 


1, if*=j7; 
é;°-e; = 
: 0, ifi4j. 


(b) Let A be the matrix of t with respect to B. Recall that the columns of 
A are the coordinate vectors of t(e;) with respect to B. Verify that the entry 
in row i and column j of the product A? A is the value t(e;) - t(e;). 

(c) Conclude that A’ A = TI, the n x n identity matrix. 

(d) Now let s : R” > R” be a linear transformation whose standard 
matrix C satisfies the equation C?C = I. Prove that for all i,j € {1,...,n}, 
we have s(e;) - s(e;) = e;-e;. Use the properties of the dot product to extend 
this formula to an arbitrary pair of vectors in R”. Use these results to complete 
the proof of Proposition 23.74 below. 


Proposition 23.74. Lett :R” — R” be a linear transformation, and let 
A be the standard matrix of t. Then t is an orthogonal transformation if and 
only if 

AP AST. (23.12) 


Because Equation 23.12 makes sense when A is a square matrix over any 
field, we use that equation to define a type of matrix which we may view as a 
generalization of a rotation matrix: 


Definition 23.75. Let F be a field and let n € Z*. A matrix A € M,,(F) is 
called an orthogonal matrix if AT A = I, where I is the n x n identity matrix 
in M,,(F). 


Now that we have a simple formula defining orthogonal matrices, we can 
start to explore their properties. 


Task 23.8.13. Let A be an n x n orthogonal matrix with entries in a field F’. 


Prove that we must have det(A) € {lr,—lr}. Conclude that A € GL, (F) 
and that A~! = A’. 


324 Projects 


Notation 23.76. The set of all n x n orthogonal matrices over a field F' is 
denoted O,,(F'). The subset of O,,(F’) consisting of matrices with determinant 
1 is denoted SO,,(F). 


As an important and naturally defined subset of a group, we expect that 
the set of all orthogonal matrices of a given size over a given field should be a 
subgroup of that group. This is true; to prove it, we first establish a property 
of transposes: 


Lemma 23.77. Let F be a field, and let A and B be matrices over F such 
that the product AB is defined (see Exercise 13.16). Then the product B? AT 
is also defined, and we have (AB)? = BTA’. 


Task 23.8.14. Prove Lemma, 23.77. 


Task 23.8.15. Let F be a field, and let n € Z*. Prove that O,,(F) < GL, (F). 
Then prove that SO,(F) is a normal subgroup of O,(F) of index 2 if 
char(F’) 4 2. Note that if char(F’) = 2, then SO,(F) = O,,(F). 

Now we are justified in naming these matrix groups: O,(f’) is called the 
orthogonal group of nxn matrices over F’, and SO,,(F) is the special orthogonal 
group. 

Task 23.8.16. Using the notation of Task 23.8.1, let My be the standard 
matrix of Ry. Show that we have 


SO2(R) ={My : 0<a< 2z}. 


Thus, the special orthogonal group over R represents “true” rotations, at 
least in dimension two. Now let F' be the flip about the z-axis defined by 
Equation 5.2, and let L be the standard matrix of F. Verify that L € O2(R) 
but L ¢ SO2(R). Conclude that O2(R) = SO2(R) U L- SO2(R). 


Remark 23.78. Task 23.8.16 gives an idea of how an orthogonal transformation 
is more general than a rotation. Notice that we can accomplish the flip F’ by 
performing a rotation in R® of 180° about the z-axis, but the 2 x 2 matrix 
L fails to capture the information that the z-axis has also been flipped! This 
is where the “missing” —1 can be found that would allow F’,, and hence all of 
O2(R), to consist of actual rotations with determinant 1 only. 

The next two tasks help to develop some insight about the orthogonal 
groups by examining their relationships to each other and to the symmetric 
groups S),. 

Task 23.8.17. Let F bea field, and let n € Z*. Define a function a : O,(F) > 
Mn+i(F) by the formula 


(23.13) 


Linear Algebra: Rotations 325 


for M € O,(F). Prove that we have o(M) € On4i(F), and that o is an 
embedding of O,,(F’) into On+1(F). 

Remark 23.79. The matrix o(M) in Task 23.8.17 is an example of what is 
called a “block-diagonal” matrix. The strategically placed Os effectively sepa- 
rate the action of the top-left block, M, and the bottom-right block, 1, into 
the spaces F” and F’, where F' is identified with its natural embedding into 
the last component of F"+?, 

Task 23.8.18. Let F be a field, and let n € Zt. For f € Sp, let Ey be the 
matrix whose entries in row i, column f(z) are 1 for all i € {1,2,...,n}, and 
whose other entries are 0, as in Task 23.6.16. Prove that for all f € S;,, we have 
E; € O,(F). Conclude that the map V from Task 23.6.16 is an embedding of 
S,, in O,(F). Note that this implies that we can permute the components of a 
vector in F'” however we like by applying an appropriate orthogonal matrix. 


Next, we will explore some of the “power” of orthogonal transformations: 
given a vector v € R”, for example, what are the possible output values Mv 
when M € O,(R)? We know that Mv must have the same length as v, but 
are there any other restrictions? 


Task 23.8.19. Let n € Zt and let v € R”. Prove that there exists M € O,(R) 
such that Mv = |v| e1. Suggestion: Induct on n. The results of Tasks 23.8.17 
and 23.8.18 may be useful here. 


Task 23.8.19 shows that any vector can be moved so that it “points along 
the positive x-axis” by applying an appropriate orthogonal matrix (if we think 
of the first component of a vector as the « component). This agrees with 
our intuition from O2(R). Even more, our intuition suggests that any list 
of m vectors in R” can be simultaneously rotated to “fit” into the first m 
components. To realize this idea, we will write the list of vectors as columns 
of a matrix C; then computing the product AC has the effect of simultaneously 
applying A to every column of C. 


Task 23.8.20. Let m,n € Zt and let C € My»(R). Prove by induction on 
m that there exists a matrix A € O,,(R) such that (AC), , = 0 whenever 
j > k, where (AC),;,, denotes the entry of AC in row j, column k. Conclude 
as a special case the result of Proposition 23.80 below. Suggestion: Make use 
of the result of Task 23.8.19 and appropriate block-diagonal matrices. 


Proposition 23.80. For every C € M,(R), there exists A € O,(R) such 
that AC is an upper triangular matric. 


Although not directly helpful to solving our main problem, the next result 
is important enough to state and prove, as it follows readily from our current 
knowledge. 


Theorem 23.81. Let C € GLIn(R), and let (c1,c2,...,¢n) be the list of 
columns of C. Then |det(M)| < |ci| - |eal-++ + |en|, with equality iff c is 
orthogonal to c; for alli ¥ j. 


326 Projects 


Task 23.8.21. Prove Theorem 23.81. Suggestion: Make use of Proposition 
23.80 and Task 23.6.25. 


Now that we know a little about rotations, we return to the original ques- 
tion raised at the opening of this project. Let’s make things a bit easier by 
considering only a finite list of column vectors £ = (c1,...,Cm) with ¢; € R”. 
The result of “rotating a vector c; about the origin” we interpret as the vector 
Ac; for some A € O,(R). To consider the entire list £ at once, we can, as 
suggested above, write each of the columns c; from left to right to produce a 
matrix 


C=(c, @ .-. tm) € Mam(R). (23.14) 


Then the results of simultaneously rotating all of the vectors in £ using A 
are the columns of the matrix AC’ € My »(R). So our question becomes: For 
which matrices C € M;,m(R) does there exist a matrix A € O,,(R) such that 
the entries of AC are all non-negative? 

To explore this question, let us look at the relationship between C and AC. 
Since A is an orthogonal matrix, we expect that each column of AC’ should 
have the same length (as a column vector) as the corresponding column of 
C, and furthermore that the dot product of any two columns of AC should 
be the same as the dot product of the corresponding columns of C. As we 
saw in Task 23.8.12, these dot products are just the entries of the matrices 
(AC)? (AC) and C’C, respectively. Therefore, we expect these two matrices 
to be equal: 


Task 23.8.22. Let C € Mym(R) and A € O,(R). Prove that we must have 
(AC)? (AC) = CTC. (This is a straightforward calculation using the previous 
results, in particular Lemma 23.77.) 

The result of Task 23.8.22 gives us a test which allows us to show easily, in 
some cases at least, that two n x m matrices are not related by an orthogonal 
transformation: namely, if C'C #4 DD, then there can be no orthogonal 
matrix A such that D = AC. This is an example of an important phenomenon 
in mathematics, so we will discuss it further now. The next task helps to put 
the current situation in a more general framework. 


Task 23.8.23. Define a relation ~ on S := My m(R) by the condition C ~ D 
iff there exists A € O,,(R) such that D = AC. Prove that ~ is an equivalence 
relation on S$. Define a function f : S — M,,(R) by the formula f(C) = CTC. 
Prove that if C,D € S and C is in the same equivalence class as D, then 
f(C) = f(D). 

Again, the significance of the function f in Task 23.8.23 is that it provides 
an easy way to potentially show that two elements in a set are not related 
according to ~: it is easy to compute C’C and D’D from C and D and 
compare these two matrices, but it may not be easy to compute every possible 
value of the matrix AC as A runs through all elements of O,,(R.) in order to 
compare each AC to D! In this situation, we say that f is an invariant of ~. 
A general definition of this concept follows. 


Linear Algebra: Rotations 327 


Definition 23.82. Let S be a set with an equivalence relation ~. A function 
f : S + T (for some set T) is called an invariant of ~ if for all a,b € S 
we havea~b = > f(a) = f(b). If f : S>Tandg : S >U are two 
invariants of the same equivalence relation ~, then we say that f is finer than 
g if for every t € T, there exists u € U such that f~!(t) C g~!(u). Instead of 
saying that f is finer than g, we may say that g is coarser than f. 


Remark 23.83. A finer invariant gives more information, while a coarser in- 
variant gives less. The finest possible invariants are those that have distinct 
values on different equivalence classes. If we define the “quotient set” S/ ~ to 
be the set of all equivalence classes of S under ~, then an invariant of ~ is the 
same thing as a function with domain S which naturally induces a function 
on S/ ~. In this view, the finest invariants are those that induce injective 
functions on S/ ~. 


Task 23.8.24. Assume the notation of Task 23.8.23, with m = n. Define a 
function g : M,(R) — R by the formula g(C) = (det(C))?. Show that g is 
an invariant of ~, and that g is coarser than f. 


Again we return to our main question. Our strategy will use the invariant 
f of Task 23.8.23, f(C) = C7C. From now on, let ~ denote the relation from 
that task. 


Definition 23.84. Let C € Mn m(R). We will say that C' is rotatable into a 
non-negative matrix if C ~ D for some matrix D with non-negative entries. 


To give the reader an example of using the invariant f, we prove the 
following result: 


Lemma 23.85. Let C € Mim(R). If C is rotatable into a non-negative 
matrix, then all pairs of non-zero column vectors in C meet at angles of at 
most 90°. 


Proof. Suppose that C ~ D where D has non-negative entries. Then certainly 
D’D also has non-negative entries. But also, we have f(C) = f(D), so CC 
has non-negative entries. Notice that this condition says exactly that all pairs 
of column vectors in C meet at angles of at most 90° whenever the angles are 
defined, that is, for non-zero columns of C. 


A natural question is whether the converse of Lemma 23.85 holds. 


Task 23.8.25. Prove the converse of Lemma 23.85 in the case n = 2. Sugges- 
tion: Let C € Mz m(R), and suppose that CC has non-negative entries. Let 
(C1,..-,€m) be the list of column vectors of C, and let 6; be the angle be- 
tween c; and cx, ignoring any columns which are zero. Look at the maximum 
value of @;, and the two columns where it occurs; find an orthogonal matrix 
which moves one of these two columns to the positive z-axis. 


To make more progress on our question, we would like to better understand 
the invariant f. Given C € Mn m(R), we can ask for solutions D to the 


328 Projects 


equation f(C) = f(D), and then seek non-negative examples for D within the 
solution set. A more “neutral” way of asking this question is: Given a matrix 
M € M,,(R), for which A € My »(R) do we have f(A) = M? In other 
words, what is the pre-image f~'(M) of M under f? Perhaps the most basic 
version of this question is, when is f~'(/) non-empty? In order to simplify 
the question still further, we will restrict from now on to the case m = n. 
Thus we are asking: For which M € M,,(R) does there exist C € M,,(R) such 
that CTC = M? 

Our experience so far suggests that we might view C7C as the middle 
part of the formula for the length of the image of a vector under C: namely, 
|Cu|? = (Cv)? Cu = v? CT Cv. This gives a necessary condition on any matrix 
M for which f~!(M) is non-empty: 


Lemma 23.86. Let M € M,(R). If f~'(M) is non-empty, then we have 
v! Mv > 0 for every v € R”. 
Task 23.8.26. Prove Lemma, 23.86. 

Notice that the condition in Lemma, 23.86 involves the same formula that 


defines an inner product from a given square matrix (see Task 23.8.6). Thus 
we arrive at the following definition: 


Definition 23.87. Let - be an inner product on R”. We say that - is positive 
semi-definite if for all v € R” we have v-v > 0. We say that - is positive 
definite if u-v > 0 for all ve R” — {0}. We say that a matrix M € M,(R) 
is positive semi-definite (respectively, positive definite) if the inner product 
corresponding to M with respect to the standard basis of R” is positive semi- 
definite (respectively, positive definite). 


What difference does the “semi-” make in a semi-definite matrix M? Of 
course, it allows some values of v-v to be 0 even when v ¥ 0, where - is the 
inner product corresponding to M. As the reader may have suspected, this is 
related to the question of which vectors M sends to 0; that is, to whether M 
is singular. 

Task 23.8.27. Let M € M,,(R). Prove that if M is singular, then M is not 
positive definite. 

There is another property we can wrestle out of the special form CTC. 
This property is suggested by the identity v’w = w7v. 

Task 23.8.28. Let C € M,,(R). Prove that the matrix C7C is symmetric. 
Conclude that if M € M,,(R) and f~'(M) is non-empty, then M is symmetric. 
Grouping together Lemma 23.86 and Task 23.8.28, we have the following: 


Lemma 23.88. Let M € M,,(R). If there is a matrit C € M,(R) such that 
CTC = M, then M is symmetric and positive semi-definite. 


Proposition 23.80 is quite strong: it “almost” gives a unique form for choos- 
ing representatives from each equivalence class of M,,(R) under ~. This propo- 
sition may be used in the following task. First we give a definition that isolates 
the upper-left corners of a square matrix. 


Linear Algebra: Rotations 329 


Definition 23.89. Let A € M,,(F’), where F is a field. For & € {1,2,...,n}, 
the k*” principal minor of A is the matrix B € M;,(F) such that Big = Aig 
for all i,7 € {1,2,...,k}. 


Task 23.8.29. Let C € M,(R), and let A = CTC. Use Proposition 23.80 
to prove that there is an upper triangular matrix B € M,,(R) such that for 
every j € {1,2,...,n}, we have A; = BT B,;, where A; and B,; denote the 
j*® principal minors of A and of B, respectively. Use this result to help prove 
Lemma 23.90 below. 


Lemma 23.90. Let A € M,,(R). If A=C7™C for some C € M,(R), then we 
have det(A;) > 0 for all j € {1,2,...,n}, where A; is the j principal minor 
of A. Furthermore, if det(A;) =0 for some j, then we have det(A) =0. 


Now that attention has been called to the principal minors of a positive 
semi-definite matrix, we will bring minors to bear on the question of definite 
versus semi-definite. 


Task 23.8.30. Let A € M,,(R) be positive semi-definite. Prove that if 
det(A;) = 0 for some j € {1,2,...,n}, where A; is the j*® principal minor of 
A, then A is not positive definite. 

Next we investigate a converse to the results of Tasks 23.8.30 and 23.8.29. 


Task 23.8.31. Let A € M,(R), and suppose that A is symmetric and 
det(A;) > 0 for all j € {1,2,...,n}, where A; is the j™ principal minor of 
A. Prove that A = CTC for some C € GL,,(R). Suggestion: Show inductively 
that A; is of the form C(j)’C(j) where C(j) € M;(R) and C(j) is upper tri- 
angular, with non-zero entries on the diagonal. Use the following steps to move 
the induction from j to 7 + 1. Show that we must have det(C(j)) 4 0. Look 
for a vector v € R/ that has the appropriate dot products with the columns 
of C(j). The final step of the induction is to show that |u|? < Aj+1,j+1 So that 
we may choose the (j +1)" component of v. For this purpose, let z be a square 
root of the real number Aj+1,;41 — |u|? in C, and show that this assignment 
gives the desired matrix A;+1; deduce that if z € C — R then det(A,;41) < 0, 
a contradiction. 


Task 23.8.32. Prove Theorem 23.91 below. This should not involve much 
“new” work. 


Theorem 23.91. Let A be a symmetric matrix in M,(R). Then the following 
are equivalent: 

(i) A=C"7C for some C € GL,(R). 

(ii) det(A;) > 0 for all j € {1,2,...,n}, where Aj; is the j principal 
minor of A. 


Finally, we can give a partial answer to our original question. Although 
not a complete answer, it is surprising enough to form a conclusion to this 
project. 


330 Projects 


Definition 23.92. Let n € R”. The non-negative orthant of R” is the subset 
of R” whose elements have non-negative components; i.e., the set 


{(t1,---,%n) €R” : x; > 0 for all z}. 


Proposition 23.93. There exists a list of n vectors in R” such that every 
pair of vectors meets at an angle of at most 90°, but such that these vectors 
cannot be simultaneously rotated into the non-negative orthant of R”. More 
formally, there exists a matrix C € M,(R) such that C’C has non-negative 
entries, but such that there does not exist a matrix A € O,(R) such that AC 
has non-negative entries. 


Proof. Let n = 5. Consider the matrix 


M = M(a) = 


e2c00rR 
e eg OrRSO 
eQarHoOo 
SOrPSe ef 
rPoe R88 


where a is a non-negative real number. Let M;(a) denote the j‘® principal 
minor of M as a function of a. When a = 0, then M = I, so we have 
det(M;(0)) = 1 for every 7 € {1,...,5}. Now det(M;(a)) is a polynomial 
function in a, so by continuity, there exists a small positive value of a such 
that det(M;(a)) > 0 for each j. Fix such an a. By Theorem 23.91, we can 
write M = C7TC for some matrix C € M,(R). Assume for a contradiction 
that C has only non-negative entries. Then, since the first 3 column vectors 
of C are mutually orthogonal, at least one of these columns has exactly one 
non-zero entry; without loss of generality (in view of the symmetry of M with 
respect to the first 3 columns), the only non-zero entry in the first column of 
C is Cy = 1. But then C),4 = Ci,5 = a, so columns 4 and 5 of C’ cannot be 
orthogonal, contradicting the fact that M45 = 0. 


For further information on this topic, see the article [7]. 


23.9 Power Series 


Prerequisites. Having completed the text through Chapter 20 is recommended 
to fully understand this material; the reader who has completed through Chap- 
ter 14 should be able to reach Task 23.9.8. 


We have studied polynomials since our first (middle-school) algebra class, 


and devoted a whole chapter to them in this text (Chapter 14). Those readers 
of a more daring nature may have wondered what happens to polynomials if 


Power Series 331 


we allow the powers of the variable to go up forever: is there such a thing as 
an “infinite polynomial” which is an infinite sum of terms ag +a,¢2+ao27+--- 
and so on, without ending? Those who have studied calculus may have seen 
such things, and they are called power series. As with polynomials, however, 
we will be careful to distinguish between a formal power series, where we have 
a variable like x which is algebraically independent over the coefficient ring, 
versus a power series function whose “variable” actually represents, say, a real 
number. In a typical calculus class, these two different concepts are easily 
confused. 


Definition 23.94. Let R be a commutative ring with 1. A power series in 
the variable x with coefficients from R is a sum 


co 
f=) aye? 
j=0 


where a; € R. More precisely, we can represent f as a function f : N-~R 
from the set of natural numbers to the ring R, where f(j) = a,;. Unlike in the 
case of polynomials, we do not require the coefficients a; to be zero for all 
j >> 0. 


Notation 23.95. The set of all power series in x with coefficients from R is 
denoted R[[x]]. 


Our immediate worry with power series is whether we can add and multiply 
them without having to compute an infinite sum or an infinite product. If we 
had to compute something like 


ao + a, + a2+°°: 


with a; € R, then we would be in trouble! It turns out that everything is 
alright: 


Task 23.9.1. Prove that addition and multiplication of power series can be 
defined as in Definition 14.1, but replacing the upper bounds of the sums by 
oo, except that the upper bound in the inner sum in Equation 14.2 remains 
k. (This amounts to noting that in this definition, all of the sums of elements 
of R are finite sums.) Then prove the power series version of Lemma 14.8, 
namely: 


Lemma 23.96. If R is a commutative ring with 1, then so is R[[a]]. 


Finally, show that we have a natural embedding R[z] @ R[x]. 


Task 23.9.2. Let R be a commutative ring with 1. Suppose that instead of 
using the natural numbers as powers of the variable x, we allow any integer 
power, to get things such as 


332 Projects 


with a; € R. Let us call such things doubly infinite series. Show that we 
can still define addition of two doubly infinite series, but that the “natural” 
definition of multiplication of doubly infinite series fails to make sense in 
general. 


Task 23.9.3. Let R be a commutative ring with 1, and let 
f= 2 € Riz]. 
j=0 


Calculate (1 — x) - f to prove that 
f=(1—2)7' in R[fz]]. (23.15) 


This result may not surprise the reader who is familiar with geometric 
series; what is new here for those who studied geometric series in algebra or 
calculus? First, we do not need to add the condition “if || < 1” to Equation 
23.15; indeed, this condition does not even make sense here, since x is a vari- 
able, not necessarily a number. Instead, Equation 23.15 is an identity in the 
ring R[[a]]. 

In calculus, we learn that we can add up an infinite series of real numbers 
only if the numbers get small (approach 0) as we add more and more of them. 
There is a similar way to think about the infinite sum in a power series; in 
this case, we can think of the higher and higher powers of x as being closer 
and closer to 0. In terms of ideals, this is more natural, since we have the 
following: 


Task 23.9.4. Let R be a commutative ring with 1, and let J = (2) = «R[[z]] 
be the ideal generated by x in the power series ring R[[z]]. Prove that for 
every positive integer n, we have I” = (x”). Then prove that N92," = (0). 
Definition 23.97. Let S be a commutative ring with 1, and suppose that I is 
an ideal of S such that N72,/" = (0). We say that S is complete with respect 
to I if for every sequence aj,a@2,... of elements of S satisfying an41 = an 
mod J”, there is a unique element a € S such that a = a, mod I” for all 
neZr. 


Notation 23.98. In the situation of Definition 23.97, we will write 


a= lim ay 
noo 


and we say that a is the limit of the a. 


Completeness is an important notion that occurs in both analysis and al- 
gebra, although the definition given above, in terms of ideals, is more common 
in algebra. The point of completeness is that a sequence of ring elements that 
“seems to be going somewhere” must in fact have a limiting value within the 
given ring. 


Power Series 333 


Task 23.9.5. Let R be a commutative ring with 1 4 0. Prove that R[[2]] is 
complete with respect to (7), but that the corresponding statement for R[z] 
is false. 


Task 23.9.6 (Limit Laws). Prove that in a complete ring, limits commute 
with both addition and multiplication. More precisely, let S be a commuta- 
tive ring with 1 which is complete with respect to an ideal I. Suppose that 
limpsoo Gn = a and limn +o bn = bin S. Prove that limp. (Gn + bn) = a+b 
and limy_..0(@nbn) = ab. 


We have seen that polynomial rings do not have very many units, at least 
when the coefficient ring is a domain (Lemma 14.12). But in Task 23.9.3, we 
found that in any commutative ring with 1, the polynomial x + 1 is always a 
unit of the bigger ring of power series R[[2]]. Exactly what can we say about 
units in power series rings? 


Task 23.9.7. Let R be a commutative ring with 1, and let S = R[[a]]. Prove 
that we have 


S* = {ap + aye + agn* +---€ S| ag € R*}. 


That is, a power series is a unit iff its constant coefficient is a unit, no matter 
what the higher terms look like. Suggestion: The harder direction is the <= 
implication. Suppose f € S has unit constant term. Prove by induction that 
for every n € N there is a polynomial g,, € R[a] of degree at most n such that 
f-gn =1 mod (2"**), with gn41 = gn mod (x"). Then use the Limit Laws. 


From now on, we restrict the coefficient ring of our power series to be a 
field, in order to get some stronger results. When we move from polynomials 
to power series, we lose the concept of degree, since a single power series can 
involve arbitrarily large powers of 7. But when we work over a field, then the 
smallest power of x in a power series plays a similar role: 


Task 23.9.8. Let F be a field, and let S = F'|[z]]. Let a € S — (0). Prove that 
there is a unique n € N and a unique u € S* such that a = x” - u. Use this 
result to prove that S is a PID. 


Notation 23.99. Let F' be a field, and let S = F'|[2]]. We denote the field 
of fractions of S by F((x)). We call F((x)) the field of Laurent series with 
coefficients in F’. 


6 


Next, we introduce a standard notation for localizing when we “really” 


just need to invert a single non-zero ring element: 


Notation 23.100. Let S' be a domain, and let a € S — (0). Let U = {a” |ne€ 
N}. We denote the localization S[U~"] by S[a~1). 


Task 23.9.9. Let F be a field, and let S = F'|[a]]. Prove that F((x)) = 
Sle] = {j2, aye? | k € Zia; € F}. That is, a general Laurent series 
over F looks like a power series over F’ except that we can start summing at 
a negative power of x. 


334 Projects 


We conclude this project with two applications of power series: one to 
discrete mathematics, and another to calculus. 


Task 23.9.10. Define a sequence a1, @2,... by a, = ag = 1 and an42 = Gn4it+ 
dy for all n € Z* (compare to task 23.7.16). Let f = bore ajv? € R{[z]]. 
Show that f satisfies the equation f = 2+ 2?f + af. Solve for f to see that 
f is actually a rational function, and write the result in the form 


A B 


ac =, 
L-wW £-D 


where A and B are appropriate real numbers, and w and @ are the roots of 
the denominator of f. Then use the formula for the sum of a geometric series 
to find an explicit formula for the coefficients of f. 


Finally, we introduce derivatives of power series. 


Definition 23.101. Let R be a commutative ring with 1. For f = 
er, ajv? € R{[z]], define the derivative of f with respect to x to be the 


power series 
co 
io ) S30 
j=1 


Task 23.9.11. Prove that, just as for polynomials, the sum and product rules 
are true for derivatives of power series. 


Task 23.9.12. Let F' be a field of characteristic 0. Prove that there is a unique 
power series f € F'|[2]] with constant term 1 such that f’ = f, and find an 
explicit formula for f. Then compute f’ from f to verify that f’ = f. This 
power series will be familiar if you studied Taylor Series in calculus, as it is the 
power series representation about x = 0 of the natural exponential function 
e” for x € R. Thus, power series can let us see directly the inner workings 
even of this mysterious function! In addition, you have just used power series 
to solve the differential equation f’ = f. In general, a “differential equation” 
just means any equation involving derivatives. 


To go any farther in a discussion of the real function e* would be out of 
our scope. Unlike in the case of polynomial rings, there is no general way to 
evaluate a power series at a given ring element: in the case of power series, 
we would in general get an infinite sum of ring elements. This crosses the 
boundary from algebra to analysis. However, we cannot resist giving one last 
task that relates the function e* to algebraic concepts: 


Task 23.9.13. Let f € F|[x]] be the power series from Task 23.9.12. Prove that 
f is transcendental over the field of rational functions F(x); thus we say that 
e* is a transcendental function. Suggestion: assume for a contradiction that f 
is algebraic over F(x); clear denominators from the irreducible polynomial of 
f over F(x); and take derivatives until the constant term disappears. You need 
to show that the remaining terms do not disappear to arrive at a contradiction. 


Quadratic Probing 335 


23.10 Quadratic Probing 


Prerequisites. Having completed Chapter 17 is recommended. 


In computer science, especially within data science, one of the most basic 
problems is how to store and retrieve data quickly. Given a sequence of m 
memory locations (also called addresses) consecutively numbered 0, 1, 2, ..., 
m —1, and assuming that each data object fits into a single memory address, 
how can we decide where to store the next object that comes in? How should 
we proceed to search for a requested object? 

A very simple solution is to store each new object in the next available 
memory address, keeping track of this last used address. But if we do this, 
then searching for a requested object would become slower and slower, on 
average, as the number of stored objects grew; in the worst case, where the 
requested object is never found (because it was never stored in the first place), 
the search time is proportional to the total number of stored objects. It turns 
out that there are much better solutions! 

One trusted solution to the data storage-and-retrieval problem is known as 
hashing. In this approach, each incoming object is first converted into a number 
by applying a fixed function h, called the hash function, which has as ouput 
values the integers between 0 and m — 1. Then we try to store each incoming 
object x at memory address h(a). Of course, if someone requests object 2, 
we also know to look for x at the same location, h(x). In this situation, the 
entire collection of memory from 0 to m — 1 is called a hash table. For later 
use, we set R,, = {0,1,...,m— 1}. (We note that our discussion is slightly 
simplified, and is relevant when we only care whether or not a given object 
exists in the hash table. In practice, we instead designate a fixed portion of 
the data objects as a key, and apply the hash function to the key. For example, 
the key of a “Person” object could be the name of the Person; then given only 
a name, we could search for the corresponding Person.) 

The problem with hashing is that two different objects may have the same 
hash value; this is known as a collision, since both objects want to be stored at 
the same location. To resolve collisions, the usual solution is to find a memory 
location other than the preferred location h(x) in which to store x. Specifically, 
we search for an unused location, starting with h(x), and proceeding in a fixed 
order through the other memory locations until an unused address is found. 
In this procedure, we use the word probe instead of “search,” and the order in 
which we probe for an unused address is given by a function 


p: R? = Rm; 


called a probing function. To say p(i,7) = k means that when an object x 
has hash value h(x) = 7 and we are performing the j'” probe for «, then we 
should look at address k. We note that the initial attempt at probing has the 
value j = 0, consistent with the choice of the set R,,. (Computer scientists, 


336 Projects 


like logicians, begin counting at zero.) We require p(i,0) = 7 so that a probe 
for an object x will always start at location i = h(a), as mentioned earlier; for 
this reason, we sometimes refer to h(a) as the home position of object x. 


Example 23.102. Suppose that m = 5 and we choose the probing function p 
given by the formula p(i, 7) = (¢+7)%5, where the % operator means to take 
the remainder modulo 5. More precisely, % is a function from Z x (Z — {0}) 
to Z, with a%b = c where c = a (mod b) and c € Rp. (So % is “almost” a 
binary operation on Z, but we do not allow b = 0.) Further suppose that an 
object « has hash value h(a) = 3. Then when we attempt to add x to our hash 
table, we will probe for an empty memory location in the order 3,4, 0,1, 2. 
We may not need to use this entire probing sequence, since we stop as soon as 
we find an empty location. If we are asked to retrieve x from the hash table, 
we will use the same probing sequence, but this time, instead of looking for 
an empty address, we look for an address which contains x. More generally, 
for any positive integer m, we can use the function p(i,7) = (¢+7)%m for 
probing; this is called the linear probing function. 

It turns out that linear probing creates some problems which can be largely 
avoided using alternative probing functions. Namely (for those readers who 
are interested), if we assume that the hash values of incoming objects are inde- 
pendent and uniformly distributed over the set R,,, then linear probing tends 
to create large “clusters” of consecutive occupied locations in the hash table; 
this will increase the variance of the time needed to store and retrieve objects. 
Therefore, we define a slightly more complicated type of probing function. 


Definition 23.103. For a real number t, we write |t| for the unique integer a 
such that a < t < a+1, and we call a the floor of t. The function p : R?, > Rm 
given by the formula 


pli, j) = G+ (-1)™ - Lj +:1)/2]?)%m 
is called the quadratic probing function for a hash table of size m. 


Although quadratic probing helps to reduce clustering, it introduces an- 
other problem: a quadratic probing function may not be surjective! A lack of 
surjectivity prevents quadratic probing from reaching some memory locations, 
which could cause a data storage attempt to fail even if the hash table is not 
full. 


Task 23.10.1. Show by computation that the quadratic probing function is 
surjective when m = 7, but not surjective when m = 5. 

Our main concern in this project is to prove the following result, which 
tells us which table sizes m give surjective quadratic probing functions. 


Theorem 23.104 (Theorem Q). Let m be a positive integer. Then the 
quadratic probing function is surjective for a hash table of size m iff one of 
the following conditions is true: 

(1) m=1 orm=2; 


Quadratic Probing 337 


(2) m is prime and m % 4 = 3; 
(3) m is even, m/2 is prime, and (m/2) % 4 =3. 


The proof of Theorem Q requires some knowledge of how arithmetic works 
modulo m, or, even better, some knowledge of abstract algebra. For this rea- 
son, a complete proof is usually not given in computer science courses. 

In what follows, we endow the set R,, with addition and multiplication 
modulo m, so that R,, is isomorphic to the quotient ring Z/m2Z. 

To understand when quadratic probing is surjective for a hash table of 
size m, we will break the question down into simpler questions (“reduce” the 
problem). Our first task just establishes the translation between probing a 
hash table on the one hand, and modular algebra on the other hand. 


Task 23.10.2. Prove that quadratic probing is surjective for a hash table of 
size m iff for every possible home position e € R,,, we have 


fe+Pi,e—j? 1 0<9<m-1}=Rm. 


Since the set of perfect squares in R,, is evidently important here, we give 
it a name: 


Notation 23.105. We set Sm = {j? : 7 € Rm}. Naturally enough, we interpret 
—Sm to mean {-a : @€ Sm} ={-j? : 7 € Rm}. 

The next reduction amounts to saying that quadratic probing modulo m 
works for every home position iff it works for home position 0. The idea of the 
proof is simply to shift everything by the value of the home position. 

Task 23.10.3. Prove that quadratic probing is surjective for a hash table of 
size m iff Sm U—-Sm = Rm. 

Since S,, and —S,, are subsets of R,,, which is a finite set of size m, our 
question amounts to asking whether the size of S,,, UU —Si,, is m; that is, we 
have proved the following lemma: 


Lemma 23.106. Quadratic probing is surjective for a hash table of size m 
iff |Sm U—Sim| = m. 


It may not come as a surprise that —S,, has the same size as Sj, but it 
does require a proof: 


Task 23.10.4. Prove that |S| = | — Sim]. 

Combining the previous results enables us to complete the following task. 
Task 23.10.5. Prove that quadratic probing reaches every cell of a hash table 
of size m iff |Sin| = (m+ |Sm A —Sm|)/2; and if this happens, then |S,,,| > 
(m+ 1)/2. Suggestion: Use the formula |A U B| = |A| + |B] —|AN BI, which 
is true for any finite sets A and B. 

Since |R,,| = m, Task 23.10.5 tells us that for quadratic probing to be 


surjective, over half of the elements of R,, must be squares. From your fa- 
miliarity with ordinary perfect squares in Z, this may seem like a bizarre 


338 Projects 


condition: very few ordinary integers are perfect squares, and the density of 
the squares approaches 0 as numbers get bigger. But modulo m, it turns out 
that about half of all numbers are perfect squares when m is prime, and the 
density of squares diminishes as the number of prime factors of m increases. 
This will emerge from the following work. 


Task 23.10.6. Let p be an odd prime, and let r be a positive integer. Let 
n = p", and let d, = |S,,|/n, the “density” of the squares modulo n. Prove 
that we have d, = 1/2+1/(2n) ifr =1, and d, <1/2ifr>1. 

Suggestion: Consider the squaring function g : R, — S), given by the 
formula g(a) = a”, and look at the number of pre-images in R,, of an element 
a€é Sp. 


We also want to know what happens for powers of 2: 
Task 23.10.7. Let n = 2” where r is a positive integer, and let d, = |S,|/n. 
Prove that d, = 1 when r = 1, and d, < 1/2 when r > 1. 


We next use the Chinese Remainder Theorem (Theorem 17.11) to under- 
stand the density of the squares modulo m for arbitrary positive integers m. 


Task 23.10.8 (Densities Multiply). Let m be a positive integer, and factor 
m= 1 ae p; with p; distinct primes and r; natural numbers. Let f : Rm 
Wey Z/p;'Z be the map of Theorem 17.11. Prove that we have f(Si,) = 
Syn Xo x Syre and dm = [Tj d 


'D j=l pri 
We are now ready to prove a large piece of the forward direction of 
Theorem Q. 


Task 23.10.9. Let m be a positive integer. Prove that if quadratic probing is 
surjective for a hash table of size m, then m = 1 or m = 2 or m is an odd 
prime or m is twice an odd prime. 


We are getting close to our final result, but we still need to understand 
the relationship of 5; and —S, when p is an odd prime. Specifically, to apply 
Task 23.10.5 at “full strength,” we want to know how much these two sets 
overlap. 

Task 23.10.10. Let p be an odd prime. Prove that if —1 € S,, then S, = —S,; 
but if -1 ¢ S,, then S,—S, = {0}. 

It remains to determine for which odd primes p is —1 a square modulo p. 
Task 23.10.11. Let p be an odd prime. Prove that —1 € S, iff p= 1 (mod 4). 
Suggestion: Use the results of Exercises 12.15, 21.18, 18.4, and 8.11 on the 
group Ry. 

Finally we have enough tools to reach our goal. 

Task 23.10.12. Prove Theorem Q. Suggestion: First verify the cases m = 1 
and m = 2. By Task 23.10.9, we may suppose that m = p or m = 2p where p 
is an odd prime. Show that —S2, = —(S2 x S,) = Sz x —S,. Thus when we 
move from m = p to m = 2p in the equation of Task 23.10.5, both sides of the 
equation are just multiplied by 2. So we see that quadratic probing works for 


Euclidean Domains 339 


m = p iff it works for m = 2p. Thus we only need to consider the case m = p. 
Consider the two cases p= 1 (mod 4) and p= 3 (mod 4) separately. 


23.11 Euclidean Domains 


Prerequisites. Having completed through Chapter 20 is recommended. 


A Principal Ideal Domain (PID) is one of the nicest types of ring we have 
seen. Two examples of PIDs in our experience so far are the ring of integers Z 
and the ring of polynomials Fx] where F is any field. In both of these cases, 
in order to prove that every ideal is principal, we used a “quotient-remainder” 
formula 

Pale 


where g and f are given ring elements, and qg and r are ring elements to be 
determined, where the “remainder” r is in some sense smaller than f. In the 
case of Z, “smaller” has the usual sense of < with integers, while in the case of 
F [az], “smaller” meant having smaller degree. Thus in both cases, we were 
able to associate an ordinary number s(a) with each (non-zero) ring element 
a, and show that the quotient-remainder formula always has a solution with 
s(r) < s(f) (provided f and r are not 0). In this project, we will use this idea 
to define a new type of ring, called a Euclidean Domain; then we will study 
one particular example of a Euclidean Domain and use it to prove a theorem 
from classical number theory. 


Definition 23.107. A Euclidean Domain is a domain R such that there exists 
a function s : R—(0) +N with the following property: for all f,g € R—(0) 
there exist g,r € R such that both 

(i) g=af +r, and 

(ii) either r = 0 or s(r) < s(f). 


Remark 23.108. The function s is called a Euclidean function; it is not formally 
part of a Euclidean Domain, and in fact, any given Euclidean Domain will have 
more than one Euclidean function. 
Task 23.11.1. Verify that both Z and F[2], where F is any field, are Euclidean 
Domains. 

The next tasks establish the place of Euclidean Domains in the hierarchy 
of rings. 
Task 23.11.2. Prove that every Euclidean Domain is a PID. You should find 
that the main idea from the proof of Theorem 14.22 can be applied here. 


Task 23.11.3. Prove that if F is a field, then every function s : F'—(0) +N 
is a Euclidean function for F'; hence F is a Euclidean domain. 


340 Projects 
Next, we will study a specific subring of C. This ring is named after the 
mathematician Carl F. Gauss. 
Definition 23.109. The set of Gaussian integers is the set 
G={a+bi :a,beE Z} 
where 7 is as usual the imaginary unit in the field of complex numbers C. 


Task 23.11.4. Show that we have G = Z/i], the ring of integers adjoin 2. 
Conclude that G is a subring of the field Q[i], and that every element of G has 
a unique representation in the form a+ bi with a,b € Z. 


Our next goal is to prove that G is a Euclidean domain. First we look at 
the geometry of how G sits inside of C. Figure 23.9 shows a small portion of 
G near 0. 


-348t -2+37 -1487 37 1482 243872 3432 
e e e e e e 


-3+20 -24+27 -14+21 2¢ 1422 2427 3421 
e e e e e e 


-34¢ -24¢ -1ti 4 ti 241i 341 
e e e e e 
oes 9 4 24 3 oR 


-3-1 -2—17 -1-1 a 1-7 2-1 3-1 
e e e 


-3-24 -2-27 -1-27 -2¢ 1-21 2-21 3-22 
e e e e e e e 


-3-37 -2-37 -1.32 -32 131 2:31 3-81 
e e e e e e 
FIGURE 23.9: Geometric View of the set of Gaussian integers, G 
Looking at Figure 23.9, it seems natural to view G as the set of corner 
points (vertices) of the solid unit squares 
{a+yi:a<a<at+l,b<y<b4l1} 


where a,b € Z. In fact, this point of view will be useful in proving that G is a 
Euclidean Domain; we now explore it further. These squares almost—but not 
quite—partition the complex plane C; the trouble is that their edges overlap. 
To fix this problem, we let 

Sap={e+yi: ax<a<atl,b<y<b+1} 


for a,b € Z. 


Euclidean Domains 341 


Task 23.11.5. Prove that the sets S,, partition C. 


What is the algebraic significance of Task 23.11.5? By adding an appro- 
priate element of G, we can move from any point of C to the square So, in 
a unique way. The following task expresses this idea precisely. 


Task 23.11.6. Note that (G,+) < (C,+), and prove that Soo is a complete 
set of left coset representatives of (G,+) in (C,+). 


Next we return to the definition of a Euclidean Domain. Observe that 
condition (i) of Definition 23.107 contains the expression gf, where f is given 
and q is an arbitrary ring element. We recognize this form as the general 
element of the principal ideal (f). 


Task 23.11.7. Let R be a domain. Prove that R is a Euclidean Domain iff 
there is a function s : R—-—(0) > N such that for all f,g € R-— (0) there 
exists r € R with g—r € (f) and either r = 0 or s(r) < s(f). 


The result of Task 23.11.7 may be interpreted as saying that in a Euclidean 
Domain every ring element g is “close enough” to some element of (f), if we 
think of the function s as measuring some kind of distance. Therefore we will 
investigate the principal ideals J = (f) = Gf of G. To get an element of I, we 
compute (a+iy)-f with x,y € Z. Separating the real and complex components 
of the first factor, we recognize that x- f is a point on the line in C containing 
f and 0, namely, the line Rf. What about zy - f? If the reader has not seen 
this before, you are in for a treat: 


Task 23.11.8. Define a function p; : C — C by the formula z +> iz for 
z €C. Prove that jy; is a linear transformation of C considered as a vector 
space over R. Let B = {1,7}, the usual basis of C over R. Find the matrix 
of ys; with respect to B, and use this to show that pu; is the counterclockwise 
rotation by 90° about the origin in C. 


Using Task 23.11.8, we see that the line Rif is perpendicular to the line 
Rf. Furthermore, the points 0, f, if, and f +if provide the corner points of 
a fundamental square with respect to (f) which plays the same role as 50,0 
does with respect to G; see Figure 23.10. 


Task 23.11.9. Let f € G —(0), and let I= (f) =Gf. Let 
Sp={(e+ty)-f : 0< 2 <10<y<1} =Soo-f. 


Sketch the set Sy in Figure 23.10, and prove that Sy is a complete set of left 
coset representatives of (J,+) in (C, +). 

Our strategy to prove that G is a Euclidean Domain comes from under- 
standing the geometry of (f). Recall that given an arbitrary g € G, we need to 
find an element of (f) which is “close” to g. In a general Euclidean Domain R, 
we may not have a geometric interpretation of R, and the meaning of “close” 
is only made clear when we can define the function s; but in the case of G, we 
have a notion of distance in C which we will attempt to use in our definition 
of s. 


342 Projects 
IR gops2if 
BA. @ft2if o2f-+if 
ef leptif oe RS 
of, ss “a | 
ott pif ae 
pap oe. SEN. Sepp oir 
e 2f+ i Say a 
< \ @ f-21 
eh ‘e-25f 


o-2f 


e2Pit pour 
o-2f-2if 


FIGURE 23.10: Geometric View of the principal ideal (f) of G 


Task 23.11.10. Define s : G —N by the formula s(z) = |z|, the usual 
absolute value of z as a complex number; that is, s(a + bi) = Va? + b?. Note 
that we can interpret |z| as the distance of z from 0. Let f € G — (0), and 
let g € G. Use geometry (pictures!) to convince yourself that there exists an 
element w € (f) such that s(g — w) < s(f). Then prove the existence of such 
a w rigorously, using algebra. 


You have just proved: 


Proposition 23.110. G is a Euclidean Domain; the usual complex absolute 
value function s(z) = |z| is a Euclidean function for G. 


As a corollary of Proposition 23.110, we have that G is a PID, hence also a 
UFD—all because of a bit of geometry with squares! This unique factorization 
result has powerful consequences for the study of ordinary integers, as we shall 
see next. 


Task 23.11.11. Define a function VN : G — G by the formula VN : wh 
w-oa(w), where o is the complex conjugation function. Prove that the image 
of N is in N. Use N to prove that if a + bi divides c in G where a,b,c € Z, 
then a? + b? divides c? in Z. 

Task 23.11.12. Prove that we have G* = {1,—1,i, —7}. 


Task 23.11.13. Let p € Z* be an odd prime. 

(a) Prove that the equation a? +1 = 0 (mod p) has a solution a € Z iff 
p=1 (mod 4) (compare Task 23.10.11). 

(b) Deduce that p is prime in G iff p = 3 (mod 4). Hint: a? + 1 factors 
in G. 


Resultants 343 


Task 23.11.14. Prove Theorem 23.111 (below). 


Theorem 23.111. All primes in Z* which are congruent to 1 modulo 4 can 
be written as the sum of two squares. More precisely, let p be a positive prime 
integer such that p= 1 (mod 4). Then there exist unique positive integers a,b 
with a <b such that p= a? 4+ b?. 


The quotient-remainder equation g = qf +r lends itself to the study of 
the greatest common divisor gcd(f,g). The idea is that a ring element divides 
both f and g if and only if it divides both f and r. This happens in any UFD, 
even without a Euclidean function, as you will show next. 


Task 23.11.15. Let R bea UFD, and let f,g € R—(0). Suppose that g = qf+r 
where q,r € R. Prove that we have gcd(f,g) = gcd(r, f). 


If we are in fact in a Euclidean Domain, then we can not only solve the 
quotient-remainder equation, but we can repeat the process until the remain- 
der r becomes 0: 


Algorithm 23.112 (Euclidean Algorithm). Let R be a Euclidean Domain with 
Euclidean function s. Let f,g € R—- (0). 

Step 1: Set i = 0, a; = g, and aj41 = f. 

Step 2: Write a; = qjaj41 + aj42 with q;,a;42 € R and either aj,2 = 0 or 
8(ai+2) < $(Qj41). 

Step 3: Replace 7 by 7+ 1. 

Step 4: If aj4; = 0 then output a;, else go to Step 2. 


Task 23.11.16. Prove that the Euclidean Algorithm will always produce an 
output for any non-zero inputs f,g; that is, prove that the condition a;,; = 0 
will eventually become true. (This “halting property” is actually one of the 
requirements that an algorithm must satisfy by definition.) Furthermore, prove 
that the output value r has the property that (r) = gcd(f, g). 


Remark 23.113. The Euclidean Algorithm is very fast—that is, requires very 
few steps to complete, including repetitions—in the case when R = Z and 
s(n) = |n|. Thus although it seems to be rather hard to factor a single integer, 
it is relatively easy to find the greatest common factor of two given integers. 
On the other hand, in the case of a polynomial ring over a field with s equal 
to the degree function, the Euclidean Algorithm can be much slower. 


23.12 Resultants 


Prerequisites. Having completed Project 23.11 is recommended before under- 
taking this project. 


In this project we will apply a modified version of the Euclidean Algorithm 
to polynomials. The idea is that, unlike in the case of Z, this algorithm has 


344 Projects 


a fairly “predictable” effect on polynomials, so the final result in the most 
general case should be a well-defined object with some universal importance. 
We want the new algorithm to respect the coefficients of the original input 
polynomials so as never to require division by them. In order to carry out 
this plan, we will make two modifications to the Euclidean Algorithm: we will 
“slow it down” by changing Step 2 so that we are only allowing the “quotient” 
polynomials q; to be single terms of the form cx; and we will cross-multiply 
appropriately at each iteration to avoid division. 


Algorithm 23.114 (Modified Euclidean Algorithm). Given a field F and poly- 
nomials f,g € F[a] — (0). 

Step 1: Set i= 0. Let f; = f and g; = g if deg(f) < deg(g), else let f; = g 
and g; = f. 

Step 2(a): Write g; = bx” + (terms of lower degree) and f; = ax” + 
(terms of lower degree), where a,b € F™. 

Step 2(b): Let h; = ag; — ba"—™ fj. 

Step 2(c): If deg(h;) < deg(fi), then let gi41 = f; and fi+1 = hj; else let 
G41 = hy and fizi = fi. 

Step 3: Replace 7 by 7+ 1. 

Step 4: If f; = 0 then output g;, else go to Step 2(a). 
Task 23.12.1. Apply the Modified Euclidean Algorithm (MEA) with F = Q, 
f = 27 + 4x — 21, and g = 22° — 4x? — 5x — 3. You should find that the 
algorithm ends when 7 = 4 with the output value g, = 85x — 255 = 85(a — 3). 
Then apply the original Euclidean Algorithm with R = Q[z] where s is the 
degree function, and the same f and g; compare the process and the results 
to the modified version. 


Task 23.12.2. Let f,g € F[a] — (0), where F is a field and deg(g) > deg(f). 

(a) Note that because F'[a] is a PID, we always have gcd(fi, 9:) = (fi, gi), 
the ideal of F[x] generated by f; and g;, by Exercise 20.22. Prove that in 
every iteration of the Modified Euclidean Algorithm, throughout each part of 
Step 2 we will have deg(g;) > deg(f;) > 0; ged(fi, gi) = gcd(fi+1, 9:41) in 
Fla]; and deg(fi+1) + deg(gi41) < deg(fi) + deg(gi) if fiz1 # 0. Conclude 
that the algorithm will eventually output a value g,, and that we will have 
gcd(f,9) = (f,9) = (ge) in Fla]. 

(b) Suppose further that R is a domain with R < F and that f,g € R[z]. 
Prove that for each 7 such that f; and g; are defined, we have fi, g; € R[z]. 
Also prove that f; and g; belong to the ideal I of R[a] generated by f and g. 
In particular, conclude that the output value gz is an element of J. 


At this point, the reader may be wondering why we defined the Modified 
Euclidean Algorithm, since it seems to produce essentially the same result as 
the original Euclidean Algorithm but more slowly, and only in a special case! 
The key difference is to be found in part (b) of Task 23.12.2, which allows us 
to work within a chosen domain. By choosing this domain carefully, we will 
find next that the MEA can be used to solve systems of polynomial equations 
in more than one variable. 


Resultants 345 


Example 23.115. Suppose that L is a field, and we have the system 
=0 


where f,g € L[x, y|—(0). To apply the MEA, we may view f and g as elements 
of (L[y])[z] < (L(y))[z], where as usual L(y) is the field of rational functions in 
the variable y with coefficients from L. Thus we will take F’ = L(y), R = Llyl, 
and apply the MEA using the given f and g. Now (by Task 23.12.2) the MEA 
is guaranteed to output a value r € L[y] such that r is in the ideal of L[z, y] 
generated by f and g. But if (a, 8) € L? is in the solution set of System 23.16, 
then since r = tfp+ugo for some t, u € L[x, y], we must have r(a, 3) = 0. Note 
that r only involves the variable y, and not x; thus the Modified Euclidean 
Algorithm has eliminated a variable for us, creating a single-variable condition 
that is easier (we hope) to solve than the original system! 

Task 23.12.3. Suppose that L is a field, a,b,c € Lily], and f=x-a,g= 
x? +br+c€ L[z,y]. Let F = L(y). Verify that the MEA applied to these 
inputs gives the output r := a? +ab+c (provided that this quantity is not 0). 
It is clear from looking at this particular system that if f =0 and g = 0 then 
we must have r = 0 too, but the point here is that the MEA produces that 
result automatically, and can produce similar results for more complicated 
systems whose solution is not obvious at first sight. 


We have seen that, like the original Euclidean Algorithm, the MEA ends 
by giving us a generator of the gcd of the original two input polynomials, 
working over the field F’. We also expect that the bigger this gcd is (measured 
by its degree with respect to x), the sooner the MEA procedure will end; 
the extreme case is when f = g, in which case the MEA ends with i = 1, 
giving output f. To gain more insight into the MEA, let us consider generic 
polynomials: that is, polynomials whose coefficients are actually variables, i.e., 
form an algebraically independent set (over the prime subfield of F’, say; we 
are engaging in a “thought experiment,” so we are not being very precise). For 
simplicity, we will also take our polynomials to be monic for now. We don’t 
expect two generic monic polynomials f and g to have a non-trivial gcd, so the 
MEA should not end early, but rather should iterate the maximum number of 
times possible given the degrees of f and g. On the other hand, because the 
coefficients of f and g are independent variables, we can imagine evaluating 
them at any values we choose; furthermore, we expect this evaluation to be 
compatible with the MEA, in the sense that we can perform the evaluation 
at any stage before, during, or after running the MEA procedure and get the 
same result—except that, as discussed above, the MEA will end earlier as 
the degree of the gcd increases. This chain of speculation leads to a concrete 
conjecture: namely, that the output r of the MEA on two generic polynomials 
should evaluate to 0 when we substitute values for the coefficients that cause 
the gcd to be non-trivial. Looking in a splitting field for fg over F’, where we 
can factor f = [[j",(x — aj) and g = []5_,(x — fj), we expect that if we 


346 Projects 


substitute a; = 6; (for any given 7 and j) then r will become 0, for then f 
and g are forced to share a common factor. 


Task 23.12.4. Let L be a field, and let S = L[ay,...,@m,{1,---,;8n] be a 
polynomial ring over L in m+n variables. Let r € S$, and suppose that 
e(r) =0, where ec : S > S is the evaluation map over L which fixes all of the 
variables in S' except for mapping 6) to a;. Prove that 6, — a, divides r in S. 


The preceding suggests that we should have 8; — a; divides r for all 4, 7. 
Unique factorization in the ring S would then force the product of all of these 
expressions to divide r. With this motivation, we make the following definition. 


Definition 23.116. Let F be a field, and let f,g € Fla] — (0) be monic 
polynomials. Let K be a splitting field for fg over F’, and write f = IGiite- 
a;) and g = jie — B;), with a,j, 6; € K. The resultant of f and g (in the 


variable x) is 
res,(f,9) -TIle 5 — 4). (23.17) 


t=1j=1 
(Recall that an empty product is defined to be 1; so if f € F or g € F, then 
resz(f, 9) _ 1.) 


Task 23.12.5. Let L be a field, and let S = L[ri,...,7m+4n] be a polynomial 
ring in m+n variables over L. Let f = [Jj (ae - a); g= = I]; AEN —rj)€ 
S[a]. Write f = 2™ + oy aja! and g = 2" + Dir 5 } bj; x4 with a,,b; € S. 
Let R = Liao,..-,@m—1,00,---,bn—1] < S, and let F be ihe field of fractions 
of R. 

(a) Use the definition of the resultant together with Galois theory to prove 
that we have resz(f,g) € F. 

(b) Use Exercise 22.20 to prove that res,(f,g) € R. 

The next task presents equivalent formulas for the resultant of two monic 
polynomials, which will be used shortly as we investigate non-monic polyno- 
mials. 

Task 23.12.6. Under the assumptions and notation of Definition 23.116, prove 
that we have 


res. (f,g) = LHe ON y™" TT alos) = (-1)""res,(g, f). (23.18) 


We would like to extend the definition of res,(f,g) to the non-monic case, 
and still avoid denominators, so that the resultant can be written as a poly- 
nomial in the coefficients of f and g, as in the monic case. The following task 
provides motivation for the way we will do this. 


Task 23.12.7. Let f,g € F[xz]—(0), where F isa field. Let K be a splitting field 
for fg over F, and write f = a-]]",(a—a;) and g = b-]]j_, (w—;) with a,b € 
F™* and aj, 8; € K. Prove that we have [Jj_, f(8;) = a” -TTi21 Tj1 (8; — 24) 
and (—1)™" TT", g(as) = 6 - TT, 1 (8; — 0). 


Perfect Numbers and Lucas’s Test 347 


In order to get a formula for a general resultant which avoids denominators, 
Task 23.12.7 suggests that we should multiply the original resultant formula 
by a”b™. 


Definition 23.117 (General Resultants). Let F be a field, and let f,g € 
Fz] — (0) be polynomials, not necessarily monic. Let K be a splitting field 
of fg over F, and write f =a-]]j" (2 —a,;) and g = b- []j_, (a — 6;), with 
a;,8; € K. The resultant of f and g (in the variable 2) is 


m n 


rese(f,g) = ab” - |] [] (6; — a4). (23.19) 


i=1 j=1 


Task 23.12.8. With the notation of Definition 23.117, prove that we have 


m 


resa(f,g) =: II f(8;) = (-1y™ - a” - TJ gaa) = (-1)""res2(9, f). 


i=l 


Task 23.12.9. Let F be a field, and let f,g € Fa] — (0) be polynomials. 
Write f = »y=0 a;v? and g = ea bjaI with aj,b; € F*. Let R be the 
image of Z under the characteristic map y : Z— F. Prove that res,(f,g) 
is an element of the ring S := Rlag,...,@m,00,---, bn]. Suggestion: Consider 


resz(f/@m,g/bn); use Task 23.12.5. 

Remark 23.118. Let u(f,g) denote the output of the MEA with input f and g. 
The exact relationship between p(f,g) and res,(f,g) deserves to be explored, 
but we leave this to the reader. We note that it can be shown that if f,g € R[z] 
for some domain R < F, then res,(f,g) is in the ideal of R[x] generated by f 
and g, just as was shown for p(f,g) in Task 23.12.2. 


23.13 Perfect Numbers and Lucas’s Test 


Prerequisites. It is recommended to have completed through Chapter 21 before 
starting this project. 

In this project we explore a type of ordinary integer known as a “perfect 
number,” which has held people’s interest since ancient times. Then we present 
a remarkable test from the nineteenth century which allows us to determine 
enormously large even perfect numbers extremely quickly. Finally, we work 
out a proof that this test does always give the correct answer. 

The idea of a perfect number is that it equals the sum of its “parts,” if 
we interpret parts appropriately. We want to say that a “part” of a whole 
number means a factor of that number; but we must exclude a number from 
being part of itself, or else the sum will almost always be too big. This leads 
to the following definitions. 


- 


348 Projects 


Definition 23.119. Let n € Zt. A proper factor of n is a positive integer f 
such that f divides n in Z and f #n. 


Definition 23.120. Let n € Z with n > 2. Then n is perfect if n is equal to 
the sum of its proper factors. 


Example 23.121. The sum of the proper factors of a prime integer is always 
1. The sum of the proper factors of 4 is 3, and the sum of the proper factors 
of 6 is 1+2+3=6. Therefore, 6 is the smallest perfect number. 


Task 23.13.1. Verify that 28 is the second-smallest perfect number. 


Even though we excluded n itself when summing the factors of n, it is 
mathematically more natural to include n; this leads to simpler formulas, as 
we shall see. 


Definition 23.122. Define a function F on the domain Zt by letting F(n) 
be the set of all positive integer factors of n; that is, 


F(n)={feZ:1<f<nand f divides n}. 
Define a function 0 : Zt + Zt by the formula 
a(n) = S- ves 
feF(n) 


The relationship of the function o to perfect numbers is clear; the reader 
should verify the following lemma. 


Lemma 23.123. Let n be an integer with n > 2. Then n is perfect iff a(n) = 
2n. 


Next we develop a formula for a. 


Task 23.13.2. Let m and n be relatively prime positive integers. Define a 
function ~ : F(m) x F(n) > Z* by the formula w(f,g) > f -g. Prove that 
the image of ~ is the set F(mn), and that ~ is a bijection from F(m) x F'(n) 
to F(mn). Use this result to prove Lemma 23.124 below. 


Lemma 23.124. If m and n are relatively prime positive integers, then we 
have o(mn) = o(m)o(n). 


Definition 23.125. Let f : Zt > Z*. Then f is called multiplicative if we 
have f(a-b) = f(a) - f(b) for all a and 6 which are relatively prime. 


Task 23.13.3. Define a function p : Z* + Z* by the formula p(n) = o(n)/n. 
Prove that p is multiplicative. 


Task 23.13.4. Use the result of Lemma 23.124 inductively to prove that if 
n =[]j_, p;’ where the p; are distinct positive primes and e; € Zt, then we 
have o(n) = [];_1 o(p;"). 


Perfect Numbers and Lucas’s Test 349 


Task 23.13.5. Establish that if p is a positive prime integer and e € Zt, then 
we have o(p°) = (p*t! — 1)/(p — 1). (Hint: Recall the formula for the sum of 
a finite geometric series.) Use this together with the previous results to prove 
Lemma 23.126 below. 


Lemma 23.126. Let n € Z with n > 2, and let n = [J;_, p;' be the prime 
factorization of n into powers of distinct primes. Then we have 


From now on, we focus on even perfect numbers. As of this writing, it is 
still unknown whether there are any odd perfect numbers. On the other hand, 
even perfect numbers can be shown to have a very special form, of which 
several dozen examples are known. To reach this form, we first establish some 
properties of the function p. 


Task 23.13.6. Prove that ifn € Z* and p is a positive prime integer which 
divides n, then we have p(n) > 1+1/p, with equality if and only if n = p. 


Task 23.13.7. Show that if e € Zt, then we have 2/p(2°) = 1+ 1/(2¢+! — 1). 


Task 23.13.8. Suppose that n is an even perfect number. Write n = 2°-c¢ 
where c is odd. Use the above results to prove that we have p > 2°+! — 1 
for any prime p which divides c. On the other hand, prove that we also have 
2°+! _ 1 divides c. Conclude that 2°+! — 1 must be prime. 


Definition 23.127. A Mersenne prime is a prime integer of the form 2/ — 1, 
where j € ZT. 


Task 23.13.9. Prove that if 7 € Z+ and 2/ —1 is prime, then j must be prime. 
Suggestion: Prove the contrapositive, using the fact that « — 1 divides x* — 1 
in Z[z] for any a € Zt. 

With this terminology in place, we now know that every even perfect num- 
ber n is divisible by a Mersenne prime; and in fact, this Mersenne prime must 
be equal to 27 — 1 where 2/~! exactly divides n. It is natural to investigate 
the simplest such case, when these two numbers form the entire prime factor- 
ization of n; it turns out that we have already struck gold: 


Task 23.13.10. Let 7 € Z*, and suppose that 2’ — 1 is a Mersenne prime. 
Prove that 2/~1 - (27 — 1) is perfect. 


Task 23.13.11. Use the previous results to prove Proposition 23.128 below. 
Proposition 23.128. Let p be a positive prime integer such that 2? — 1 is 


prime. Then 2?—'.(2? —1) is perfect. Furthermore, every even perfect number 
is of this form. 


We next present the remarkably simple and fast test that forms the sec- 
ond part of this project’s title. Instead of our usual approach of motivating 


350 Projects 


results before stating them, we will start by stating the result, and then try to 
“reverse-engineer” it to see why it is correct. The test takes an odd prime p as 
its input, and is supposed to determine whether the number 2? — 1 is prime. 
Algorithm 23.129 (Lucas’s Test). Let p be a positive odd prime integer. Let 
q = 2? —1, and let r = 4. Repeat the following step p — 2 times: 
Replace r by r? — 2 (modulo q). 

If the final value of r is 0 modulo q, then output true (q is prime); otherwise, 
output false (q is not prime). 


Example 23.130. Let p = 3, so g = 23 —1= 7. Then we only need to perform 
the main step of Lucas’s Test once; we compute 4? —2 = 14 =0 (mod 7), and 
output true. Indeed, 7 is prime, so the test works in this case. 


Task 23.13.12. Use Lucas’s Test to determine (by hand) whether 2° — 1 is 
prime. If you know how, then write a computer program to determine all of 
the Mersenne primes up to 2°’ — 1 with Lucas’s Test (you may want to use an 
arbitrary-precision integer type for these computations); compare the running 
time to that of “trial division,” where we simply loop through all possible 
factors to determine primality. 


Until now, this project has only used elementary number theory; but to 
prove the correctness of Lucas’s Test, we shall need to use some abstract 
algebra, and also learn a bit more about the properties of numbers. 

There is only one “independent variable” in Lucas’s Test, namely the prime 
p. Sometimes in order to understand a very specific process, it is best to 
consider a more general situation. In this case, there are not many things to 
generalize, and we choose to generalize the number 4 into a variable, x. With 
this change, it’s not hard to see that Lucas’s Test creates a polynomial: the 
first iteration turns x into x? — 2; next we get (a? — 2)? — 2; etc. But where 
should we say that the coefficients of these polynomials live? Since we are 
told to compute modulo gq, it would be natural to make Z/qZ our coefficient 
ring; in the case of most interest to us, namely when q is prime, this ring is 
a field. But we must somehow also deal with the case when q is not prime, 
in which case this ring is not even a domain—much worse! After considering 
the problem for some time, and with a bit of experience, we choose to define 
these polynomials over a general field: 


Definition 23.131. Let K be a field. Let fo = 2 € K[a], and for j € Zt, 
define f; recursively by the formula f; = u 4 —2€ K{z]. 


Task 23.13.13. Let n € Z*. Prove by induction that we have 


fn= Fi Of 80+ ofr) +7); (23.20) 


where f; appears n times in this composition; see Exercise 14.15 for this 
notation and some of its properties. Conclude using Lemma 3.32 that we have 
fn = f; ° fe for any positive integers j,k with j +k =n. 


Perfect Numbers and Lucas’s Test 351 


Task 23.13.14. Let p be a positive prime integer, and q = 2? —1. Suppose that 
q is prime, and let f = fp-2 € K[z] where K = Z/qZ. Prove that Lucas’s 
Test ouputs “true” if and only if 4 is a root of f in K (where we interpret 4 
as 4-lk =4+qZ). 

From Task 23.13.14, we realize that we should study the roots of the poly- 
nomials f;. Since we are more familiar with the field R. of real numbers than 
with most other fields, let’s first take K = R. 

Task 23.13.15. Verify that with K = R, the roots of f; are +2, the roots of 
fo are +\/2 + v2, and in general, the roots of f, are 


2 24+/---+ V2, (23.21) 


where there are n nested square root signs, and we can choose all of the + 
and — signs independently. 


To make further progress, we appeal to our knowledge of trigonometric 
formulas. Specifically, if the reader has ever taken a 45°-angle and bisected 
it repeatedly, then the expression for the roots of f, should look somewhat 
familiar. 


Task 23.13.16. Let K =R and let n € Z*. Prove by induction that the roots 
of f, are the real numbers 2 cos(€7/2"*!) where  € Z,1< ¢< 2"*? andl=1 
(mod 4). Suggestion: Use the double-angle formula cos(2a@) = 2 cos?(a) — 1, 
and Task 23.13.13. 


It is fair to ask at this point whether trigonometric formulas can possibly 
help us to understand the roots of f,, over a finite field, since that is our main 
interest. The answer is Yes, if we remember Equations 21.2 and 21.4, which 
allow us to write a cosine in terms of roots of 1. 


Task 23.13.17. Let n,@€ Z*. Show that we have 
2cos(ém/2"*1) = cf 4+ ¢-* 


where ¢ = exp(27/2"t?), a root of unity in C* of order 2”*?. 


The point is that roots of 1 are algebraic, so we may be able to translate 
these ideas over to finite fields by looking for elements which satisfy the same 
relations. Specifically, to get roots of f,, we need 2”+?-th roots of 1; that: is, 
elements ¢ such that ¢?”” = 1. To make sure that these roots are distinct, 
we exclude the case char(K) = 2. 

Task 23.13.18. Let K be a field with char(K) 4 2, let n € Zt, and suppose 
that the polynomial g,, := ee | splits completely in K[z]. 

(a) Prove that gy, has 2"t? distinct roots in K. 

(b) Prove by induction on n that we have gn = (2 —1)(a+1)(a?4+1)(2*4 
Dae Pisa (ee pay, 


352 Projects 


(c) For an integer j with 0 < j <n+1, prove that the roots of 2?” +1 in 
K are precisely the roots of gn which have order 2/*! in K*, and that there 
are exactly 2/ such roots. 

(d) Let ¢ be a root of unity of order 2"? in K*. Prove by induction on n 
that ¢+¢71 is root of f, in K. 

(e) Consider the function w : K* — K given by the formula a> a+a~?. 
Prove that for any b € K, we have |w-1(0)| < 2. Conclude that f,, splits 
completely in K with the 2” distinct roots 


¢+¢71! : Chas order 2"*? in K*. 


Let us focus next on the case when gq = 2? — 1 is a Mersenne prime. 
The previous results suggest that we should look for an extension field K of 
F := Z/qZ which contains roots of unity of order 2?. Recalling the results we 
know about finite fields, we look for an extension field of degree n over F' such 
that q” — 1 is divisible by 2?; because of the form of g, we don’t have to look 
far: 


Task 23.13.19. Let p be an odd prime, and suppose that q := 2? — 1 is prime. 
Let F := Z/qZ. 

(a) Prove that the polynomial h := 2? + 1 is irreducible in F[]. 

(b) Let K be a splitting field for h over F’, and let i be a root of h in K. 
Prove that every element of kK can be written uniquely in the form a+ bi with 
a,be F. 

(c) Prove that the polynomial x?” — 1 splits completely over K. 


Task 23.13.20. Assume the hypotheses and notation of Task 23.13.19. Prove 
that the following are equivalent: 

(i) Lucas’s Test returns true with input p. 

(ii) 4 is a root of fp—2 in F. 

(iii) There is a solution to the equation ¢ + ¢~' = 4 where ¢ € K and ¢ 
has order 2? in Kk”. 


Now we would like a way to tell whether an element a+ bi of K* has 
order 2?. We know from our earlier results that this is equivalent to saying 
that (a+bi)2” ' = —1. But certainly a necessary condition for this to happen 
is that (a + bi)?” = 1. 

Task 23.13.21. Assume the hypotheses and notation of Task 23.13.19. Justify 
the following statements: K is Galois over F' of degree 2, so G := Gal(K/F) 
has order 2. The map a+ bi a— bi (with a,b € F) is a non-trivial element 
of G, but so is the map w +> w%. Therefore, these two functions are equal, and 
we have (a+ bi)? = a—bi for all a,b € F. (We could also deduce this from the 
Binomial Theorem in characteristic g.) So a+ bi (where a,b are in F and are 
not both 0) has order dividing 2? iff (a + bi)?*! = 1 iff (a+ bi)(a+ bi)4 = 1 
iff (a + bi)(a — bi) = 1 iff a? + BD? = 1 iff (a+ bi) =a— bi. 

Task 23.13.22. Assume the hypotheses and notation of Task 23.13.19. Prove 
that the following are equivalent: 


Perfect Numbers and Lucas’s Test 353 


(i) There is a solution to the equation ¢+ ¢~! = 4 where ¢ € K and ¢ has 
order dividing 2? in kK”. 

(ii) —3 = b? for some b € Z/qZ. 

We would now like to answer a particular question in elementary number 
theory, namely, for which primes qg is —3 a perfect square modulo q? We adopt 
the following notation: 


Notation 23.132. Let g be a prime integer and let F = Z/qZ. Then S, denotes 
the set of perfect squares in F*. That is, Sy := {a? : a€ F*}. 

Task 23.13.23. For each prime integer q starting with g = 5, perform compu- 
tations to decide whether —3 is a perfect square modulo q. Repeat until you 
have a conjecture that describes such primes in a way that is easy to test. A 
partial answer is given in the following proposition, so try not to look ahead 
in the text before finishing this task. 


Proposition 23.133. Let q be a prime integer with q > 3. If —3 is a perfect 
square modulo q, then q=1 (mod 3). 


Task 23.13.24. Prove Proposition 23.133 by induction on q using the following 
steps. First establish the base case, g = 5. Inductively suppose the result is 
true for all primes strictly between 3 and qg. Let F = Z/qZ. Suppose that 
—3 € Sy, and show that a? + 3 = cq for some a,c € Z where c is odd and 
1 < c <q. Then break into two sub-cases according to whether or not 3 
divides c, and use the inductive hypothesis to get the desired conclusion. 


Task 23.13.25. Prove Proposition 23.134 below. (You can use the same basic 
technique that worked to prove Proposition 23.133.) 


Proposition 23.134. Let q be a prime integer with q > 3. If 3 is a perfect 
square modulo q, then q=1 (mod 12) or g=~—1 (mod 12). 


Task 23.13.26. Let q be a positive odd prime integer. Prove that we have 
—1 eS, iff ¢=1 (mod 4). Then prove that the index of S, in (Z/qZ)”* is 2. 
Use these results to prove that if ¢g = 3 (mod 4) then exactly one of 3 or —3 
is in Sy. Conclude by proving Proposition 23.135 below. 


Proposition 23.135. Let q be a prime integer with q > 3 and q = 3 (mod 4). 
Then —3 is a perfect square modulo q iff q=1 (mod 3). 


Remark 23.136. Proposition 23.134 is a special case of a result from the late 
eighteenth century first proved by Gauss and known as the Law of Quadratic 
Reciprocity (LQR). For distinct positive prime integers p and q, define 


1 if : 
CL eae: (23.22) 
p —1, otherwise. 


In full, LQR states that for odd primes p and q we have 


(2) = (-1)(44)(54) (2) . (23.23) 


354 Projects 


and in the case q = 2, 


(=) =1iffp=+1 (mod 8). (23.24) 


The fraction symbols enclosed in parentheses are called Legendre symbols, 
except for the fractions in the exponent, which are ordinary fractions(!). 


We now know that if q is a Mersenne prime, then the r value in Lucas’s 
Test will be 0 at the end of the j'® iteration, where the order of 2 + bi is 
equal to 2/*? and b is either of the two elements of Z/qZ such that b? = —3. 
We know that 7 + 2 < p, and we would like to show that equality holds so 
that Lucas’s Test gives the correct output. We would really like to say that 
2+ bi = 2+./—3i = 2+ V3 in some sense, to simplify our notation. We work 
on this next. 


Task 23.13.27. Assume the hypotheses and notation of Task 23.13.19. Prove 
that 3 ¢ S,, and deduce that the polynomial t := x? — 3 is irreducible in F'[z]. 
Show that K is a splitting field for t over F’, and let p be a root of t in K. 
Show that every element of K can be written uniquely in the form a + bp 
with a,b € F. Prove that w := 2+ p has order dividing 2? in kK”. Use the 
properties of the cyclic group K™* to show that w has order exactly 2? iff w 
is a perfect square but not a perfect fourth power in K*. Show that the two 
elements +(a + ap), where a = 2\°-))/?, are the square roots of w in K. But 
prove that neither of these two elements is a perfect square in K (you can 
reduce this to the assertion that —1 ¢ F’). Conclude that the order of w in 
™ is exactly 2?. 


We now know that Lucas’s Test is correct when given p such that g := 2?—-1 
is prime. Let’s consider the contrary case, when gq is not prime. First we must 
reconsider our choices of the fields F and K. Since we still need fields, and 
we want Lucas’s Test to have some relevance to these fields, it is natural to 
look at the field of order @ where @ is a prime factor of q; for we can recover 
an integer modulo ¢ from knowing that integer modulo gq. 


Task 23.13.28. Let p be a positive odd prime integer, and let gq = 2? — 1. 
Suppose that q is not prime, and let @ be the smallest prime factor of g in N. 
Let F = Z/€Z. Let K be a splitting field for the polynomial 2?” — 1 over F. 
Assume for a contradiction that Lucas’s Test ouputs true with input p. 

(a) Show that 4 is a root of f, in K (actually even in F!). 

(b) Show that there is an element ¢ € K™ of order 2? such that ¢+¢7! = 4. 
(See Task 23.13.18.) 

(c) Prove that ¢ € K for some field K such that F < K < K and 
[K : F]=2. 

(d) Produce a contradiction using the results above. Hint: We must have 
£ < 2P/? (why?). 


This concludes the proof of the correctness of Lucas’s Test. For such a 
simple test, there is a lot of math behind the scenes! 


Modules 355 


23.14 Modules 


Prerequisites. Having completed through Chapter 18 is recommended. Having 
completed Project 23.4 is also recommended. 


In this project we introduce a new algebraic category, possibly the most 
important one that we have not studied yet. To discover this category, let 
us reflect on our earlier work and look for similarities among some of the 
structures we studied. Recall that a vector space is an abelian group V together 
with an operation that lets us multiply a field element by an element of V, 
subject to a few natural-seeming axioms; in the definition, we isolated the field 
F in question, and said that V was a vector space over F’. Thus we formed the 
category of vector spaces over a given field. Meanwhile, within the category of 
rings, we defined an ideal of a ring R to be an abelian group J inside (R, +) 
which allows us to multiply an element of R by an element of J. The common 
theme is that in both cases we have an abelian group together with a ring 
(remember that a field is a very special type of ring!) and an appropriate 
“multiplication” operation involving an element of the ring and an element 
of the group. We can capture both of the above cases by generalizing the 
definition of a vector space over a field, replacing the field with an arbitrary 
ring. Although it is possible to use general rings here, we will restrict ourselves 
to commutative rings with 1. 


Definition 23.137. Let R be a commutative ring with 1. Then a module over 
R, or R-module, is a triple (1/,+,-), where (147,+) is an abelian group and 

: Rx M — M is a function, satisfying the following axioms for all a,b ec R 
and all m,n € M: 


(M1) (a-b)-m = a-(b-m) [Associativity] 
(M2(a)): a:-(m+n) = a-m+a-n_ [Left Distributivity] 
(M2(b)): (a+b)-m = a-m+b-m_ [Right Distributivity] 
(M3) : lp-m =m [Unitary Law] 


Task 23.14.1. Verify that if F is a field, then an F-module is the same thing 
as an F-vector space. 


Task 23.14.2. Let (R,+,-) be a commutative ring with 1. Let S C R. Prove 
that (S,+|sxs5,-lzxs) is an R-module if and only if S is an ideal of R. In 
particular, conclude that R itself is naturally an R-module. 


We assume for the remainder of this project that R always denotes a 
commutative ring with 1. Next, we would like to explore the natural questions 
we have learned to ask for any category: What are the sub-objects, quotient 
objects, and morphisms in the category of R-modules? 


Definition 23.138. Let (M,+.™,-) be an R-module. A submodule of M is 
a triple (N,+y,-n) such that N C M, +n =4+m|N x N, and -y =-ul|rxn- 


356 Projects 


We write N < M to indicate that N is a submodule of M, and we write 
N <M if N is a proper submodule of M. 


We note that, as in other categories we have studied, there is only one 
possible choice for +, and -y that could make N into a submodule of M. 


Task 23.14.3. Let M be an R-module, and let N C M. Prove that N < M 
iff N is non-empty and is closed under addition, negation, and multiplication 
by elements of R. 
Task 23.14.4. Show that an ideal of R is the same thing as a submodule of R. 
In the categories of groups and of rings, we needed to have special types of 
sub-objects in order to form quotients (namely, normal subgroups and ideals, 
respectively). The situation is simpler in the category of R-modules, as you 
will prove next: any submodule can serve as a “denominator.” 
Task 23.14.5. Let M be an R-module and suppose N < M. Let M/N denote 
the set of all left cosets of N in M under addition. Prove that M/N is an 
R-module under the operations 


(mz + N) + (m2 + N) = (my + m2) 4+ N 


and 
r-(m+N)=(rm)+N 


for m,,m2g,m€ M andre R. 


The definition of a homomorphism in the category of R-modules may not 
come as a surprise: we just need a function that commutes with both opera- 
tions. 


Definition 23.139. Let M and N be R-modules. An R-module homomor- 
phism from M to N isa function a : M— N such that for all m1, m2,m € M 
and all r € R we have 

(MH1) o(m, + m2) = o(m1) + o(me) and 

(MH2) o(r-m) =r-a(m). 


Definition 23.140. Let 0 : M — N be an R-module homomorphism. 
The kernel of o is its kernel as a homomorphism of additive groups; that is, 
ker(o) :-= 01 ({On}) ={meEM : o(m) =Oy}. 

Isomorphisms of R-modules fall into the familiar pattern: 
Task 23.14.6. Prove that ifo : M — N is a bijective R-module homo- 
morphism, then so is a! : N-+ M. Use this to help prove that a bijective 
R-module homomorphism satisfies the general definition of isomorphism given 


in Definition 23.34. 


We have a good analog of the Fundamental Theorem of Ring Homomor- 
phisms that works for modules: 


Modules 307 


Theorem 23.141 (Fundamental Theorem of Module Homomorphisms). (i) 
Let M and N be R-modules with N < M. Then the natural map 


a: M>M/N 


given by 
o(m) =m+N 


is a surjective R-module homomorphism, and ker(c) = N. 
(i) Ifr : M + L is any R-module homomorphism, then t(M) < L, 
ker(r) < M, and we have r(M) = M/ker(r). 


Task 23.14.7. Prove Theorem 23.141. 


As we continue to build the theory of modules over a commutative ring 
with 1, the next step is to understand the submodule generated by a subset of 
a module. (Even though a vector space is a special case of a module, we follow 
tradition by using the term “generate” instead of “span.”) It may not surprise 
the reader that the result looks very similar to the corresponding results for 
ideals and for subspaces. 


Notation 23.142. Let M be an R-module, and let S C M. Then the set 
(S) := {a18, +++: + ays, : KE N,a; € Ri 5; € S} 
is called the R-module generated by S. 


Lemma 23.143. Let M be an R-module, and let S C M. Then (S) < M, 
and (S) is the smallest submodule of M which contains S, in the sense that 
if N<M andS CN, then (S) CN. 


Task 23.14.8. Prove Lemma, 23.143. 


We often care about the number of generators needed to get a given mod- 
ule. Accordingly, we make the following definitions. 


Definition 23.144. A module M is cyclic if there exists a singleton set which 
generates M. 


Definition 23.145. A module &™ is finitely generated if there exists a finite 
set which generates M. 


Next, we will study the idea of “freeness” in the category of R-modules. 
Careful attention to our previous results about free objects, as highlighted in 
Task 23.5.2, leads us to think of freeness as a universal property akin to that 
of a direct product or coproduct. We will define a free R-module using this 
idea: 


Definition 23.146. Let M be an R-module, and let S C M. Then we say 
that M is free on the set S if for every R-module N and every function 
f : SN, there is a unique extension of f to an R-module homomorphism 
o:MON. 


358 Projects 


Definition 23.147. An R-module M is called free if there exists a non-empty 
set S C M such that M is free on S. 


In the following task, we will see that free modules are very special objects. 


Task 23.14.9. Use Definition 23.146 to prove that a free R-module on a non- 
empty set S is uniquely determined up to isomorphism by the cardinality of 
S. Do not assume that S is finite. 

Task 23.14.10. Let F' be a field. Prove that an F-vector space V is finitely 
generated as an F-module if and only if V is finite-dimensional over F’. Use 
this result to help prove that every finitely generated F-vector space is free. 


It is accepted wisdom among algebraists that if you want to understand a 
ring, then you should study its modules. In order to see more deeply, we should 
study these modules up to isomorphism. We have seen in Exercise 13.12 that 
every n-dimensional vector space over a field F' is isomorphic to F” under 
componentwise operations. Combining this result with that of Task 23.14.10, 
we see that every finitely-generated module over a field is isomorphic to a 
finite number of copies of that field. We can interpret this as confirming that 
fields are indeed the nicest type of ring, in that their modules are particularly 
easy to describe, with only one parameter, the dimension. 

At this point, we make another observation to connect module theory with 
our work in previous chapters. Recall from Lemma 17.16 and Exercise 17.12 
that we were able to construct a coproduct in the category of abelian groups 
by taking Cartesian products and componentwise operations. We can use the 
same type of construction with modules. 


Proposition 23.148. Let C = (M;)iez be an indexed collection of R-modules. 
Let S = ®ierM; denote the subset of [],<7 Mi where all but finitely many 
of the components are 0. Define addition in S componentwise. Also define 
r-(mi)iez to be (r+ mi)iez for r € R. Then S is a coproduct of C in the 
category of R-modules, where we take the maps % : M; — S to be the 
natural embeddings. 


Task 23.14.11. Prove Proposition 23.148. 


Remark 23.149. Just as with abelian groups, we refer to the coproduct defined 
in Proposition 23.148 as a direct sum. 


We know from Task 23.14.9 that free R-modules on a given set are es- 
sentially unique. But we still do not officially know whether free R-modules 
always exist. In fact they do, and the situation for vector spaces turns out to 
generalize. 


Task 23.14.12. Prove that R is free (as an R-module) on the set {1}. 


Task 23.14.13. Prove that every cyclic R-module is isomorphic to R/I for 
some ideal I of R. 


Notice that we have a ring structure on R/T in addition to an R-module 
structure. These structures are compatible in the sense that for all a,b € R 


Modules 359 


we have a-(b+J) = (a+ 1)- (b+ J), where the left-hand side of this equation 
is calculated in the R-module and the right-hand side in the ring structure; 
both calculations give the coset (ab) + I of I in R. 


Task 23.14.14. Let I be an ideal of R, and let a € R. Suppose that M = R/T 
as R-modules. Show that we have a € J iff for all m € M, a-m= Oy. Thus, 
we can recover J from M. 


Definition 23.150. Let M be an R-module. The annithilator of M in R is 
the set 
Anne(M) := {ae R : a-m=0 for all me M}. 


Task 23.14.15. Let M be an R-module. Prove that Annr(M) is an ideal of 
R. 


Task 23.14.16. Let N < M be R-modules. Prove that Annr(M) C Annr(N). 


Task 23.14.17. Let M be a cyclic R-module. Prove that R is free iff 
Annpr(M) = (0). 


In the case of vector spaces, we have the notion of a basis, which can serve 
as a coordinate system; the reader may have suspected that for a module 
which is free on a set S, the set S plays the role of a basis. The next task 
confirms this. 


Task 23.14.18. Let M be an R-module which is free on a set S C R. Prove that 
every element of M can be written uniquely as a finite R-linear combination 
of elements of S: that is, prove that for every m € M, we can write m = 
aes Ta + @ where rg € R, all but finitely many of the rg are Op, and the rq 
are uniquely determined by m. 


Another place we have seen a similar unique sum decomposition is in the 
study of direct sums of abelian groups, with Proposition 17.20. The next task 
asks you to prove the corresponding result for modules. 


Task 23.14.19. Let M be an R-module, and let € = (M;)iez be an indexed 
collection of submodules of M. 

(a) Prove that the function o : @jezM; — M given by the formula 
(mi)iet > DU,e7 mM is an R-module homomorphism. 

(b) Let N = Im(o). Prove that a is injective iff every element of N can be 
written uniquely as a finite R-linear combination of elements of the modules 
M;. Note: we naturally use the notation N = ier M,; in case Z is finite. 


In light of the preceding results, we expect there to be a connection between 
direct sums and freeness: 
Task 23.14.20. Let € = (M;)iez be an indexed collection of free R-modules. 
Prove that ®jezM; is free. Conclude that, in particular, 6/_,R = R” is free 
(on a set of size n) for any positive integer n. 

Combining Tasks 23.14.20 and 23.14.9, we see that every free R-module 
on a set of size n is isomorphic to R”. Now that we have a fair understanding 


360 Projects 


of free R-modules, as well as of finitely generated modules over a field (which 
all turn out to be free), let us explore modules over Z. 

To say that M is a Z-module means that M is an abelian group under an 
operation +, and that there is a way to multiply an integer n by an element of 
M which satisfies the module axioms. But we already know a way to compute 
n-m for any n € Zand m € M, by interpreting this product as exponential 
notation in the abelian group (M, +). 


Task 23.14.21. Let (W,+) be an abelian group. Define: : Zx M—- M to be 
the exponential function of Definition 3.38 in additive notation. Verify that 
Theorem 3.41 and Lemma 3.44 imply that (17,+,-) is a Z-module. 


Thus, given any abelian group, there is a natural way to make it into a 
Z-module. In the next task, you will show that this is the only way. 


Task 23.14.22. Let R be a commutative ring with 1, and let M be an R- 
module. Let x : Z— R be the characteristic map of Definition 16.21. Prove 
that for all n € Z and all m € M we have x(n)-m = n-m, where the 
left-hand side is multiplication in the R-module M, and the right-hand side is 
the exponential function in the group (M,+). Conclude that in the case R = 
Z, there is at most one multiplication function which makes (M,+) into an 
R-module. 


The preceding results suggest that a Z-module is essentially the same thing 
as an abelian group. Earlier in this project, we said that an F-module is the 
same thing as an F-vector space (where F is any field); and this is strictly 
true, because both types of structure are ordered triples which satisfy exactly 
the same axioms. But it is not strictly true that a Z-module is the same thing 
as an abelian group, because while a Z-module is an ordered triple (M,+,-), 
an abelian group is an ordered pair (MM, +)—there is no way they can be equal! 
Yet we feel that there should be a nice way to bridge this difference. To do 
so, we will go a bit further into category theory by introducing the concept 
of a functor, which plays roughly the same role between two categories as 
a morphism does between two objects within the same category: a functor 
expresses a relationship between two entire categories. 


Definition 23.151. Let C and D be two categories. A functor F' from C to 
D is an assignment for each object A of C of an object F'A of D and for each 
morphism o0 : A— BinC ofamorphism Fo : FA- FB in D, such that: 
(F1): F(id4) = ida) for every object A of C, and 
(F2): F(t 00) = F(r) 0 F(o) for any two morphisms o and 7 of C such 
that 7 0 o is defined. 


In case C = D, there is always at least one functor available, namely the 
identity functor: 


Definition 23.152. The identity functor from C to C is the functor ide defined 
by idcA = A and ideo = o for all objects A of C and all morphisms o in C. 


Modules 361 


Task 23.14.23. Prove that functors preserve isomorphisms. More precisely, 
suppose that F' is a functor from C to D and that 0 : A-— B is an isomor- 
phism in C. Prove that Fo : F.A— FB is an isomorphism in D. 

The notion corresponding to isomorphism at the level of categories is called 
equivalence: 


Definition 23.153. Let C and D be two categories. We say that these cate- 
gories are equivalent if there is a functor F' from C to D and a functor G from 
D to C with the following properties: 
(Ela): for every object A of C, there is an isomorphism 64 : FGA — A; 
(E1b): for every object X of D, there is an isomorphism ex : GFX > X; 
(E2a): for every morphism o : A— B in C, the diagram 


FGA 22, FGB 


| 5a | Sn (23.25) 
A ——> B 
commutes; and 
(E2b): for every morphism 7 : X > Y in D, the diagram 


GRR 2 Gry 
|= |~ (23.26) 
xX —— Y 

commutes. 


Task 23.14.24. Prove that the category of Z-modules is equivalent to the 
category of abelian groups. 

We saw in Task 23.14.17 that the annihilator of a module is an obstacle to 
freeness. While the annihilator of M looks for solutions r € R to the equation 
r-m =O, we next look at solutions m € M: 


Definition 23.154. Let M be an R-module. Then Torr(M) = {me M 
r-m = 0 for some r € R— (0)}. An element of Torr(M) is called a torsion 
element of M. We say that M is a torsion module if Torr(M) = M, and that 
M is torsion-free if Torr(M) = {0m}. 


Task 23.14.25. Find an example of a commutative ring R with 1 4 0 and an 
R-module M such that Torr(M) is not a submodule of M. But prove that if 
R is a domain, then we have Torr(M) < M. 

We defined torsion as a way to investigate freeness; and Task 23.14.25 
suggests that this idea will be most useful for modules over domains. The next 
task strengthens the relationship between torsion and freeness for domains: 


Task 23.14.26. Let R be a domain. Prove that every free R-module is torsion- 
free. 


362 Projects 


Naturally, we want to know whether the converse is true. 


Task 23.14.27. Let k be a field, and let R = k[x, y]. (Note that R is a UFD.) 
Consider the R-module M = (z, y), the ideal of R generated by x and y. Prove 
that M is torsion-free but not free. 


Instead of abandoning our question, let us work with an even nicer family 
of rings. We know that everything works fine over fields, but Task 23.14.27 
shows that UFDs are not good enough. Therefore, we turn to PIDs. 


Task 23.14.28. Let R be a PID, and let M be a non-trivial finitely-generated 
torsion-free R-module. Prove that M is free on S for any generating set S C M 
of minimum size. Suggestion: Induct on the size of S, and use Exercise 20.22 
to help. 


Next, we will see that the finitely-generated condition really makes a dif- 
ference for modules over a PID. (Compare this to modules over a field, where 
the Axiom of Choice enables us to show that every non-trivial vector space is 
free.) 


Task 23.14.29. Let M = (Q,+), considered as a Z-module. Prove that M is 
torsion-free but not free. 


We have studied torsion modules and free modules, but these are two 
extreme cases; Task 23.14.27 shows that they are not exhaustive. A natural 
question is: If we “remove” the torsion part of a module, are we left with 
a torsion-free module? This question makes the most sense over a domain, 
where torsion modules are submodules: 


Task 23.14.30. Let R be a domain, let M be an R-module, and let T = 
Torr(M). Prove that M/T is torsion-free. 


Task 23.14.31. Let M be a finitely-generated R-module and let N < M. 
Prove that M/N is also finitely-generated. Specifically, if M = (S), prove 
that M/N = ({a+N : a€ S}). 


Over a PID, it turns out that every finitely generated module splits into 
its torsion part and a free part: 


Task 23.14.32. Let R be a PID, and let M be a finitely-generated R-module. 
Let T = Torr(M), and let F = M/T. Prove that F is free on some finite set 
S C F. Further, prove that M is a direct sum of T and F. 


Task 23.14.32 gives an important structure theorem for finitely-generated 
modules over PIDs. Since Z is a PID, we will try to find guidance in the 
Fundamental Theorem of Finite Abelian Groups, Theorem 18.31. We know 
that a finitely-generated free Z-module is isomorphic to Z” for some n € Z*, 
and hence is infinite as a set. Therefore, a finite abelian group must have a 
trivial free part, so must be pure torsion. Of course, a finite abelian group is 
also finitely-generated. 


Task 23.14.33. Let (M,+) be an abelian group. Prove that M is finite iff M 
is both finitely-generated and a torsion Z-module. 


Modules 363 


Before we can apply Theorem 18.31 to the study of finitely-generated 
abelian groups, we need to know that the torsion part of a finitely-generated 
abelian group is also finitely-generated. The next task takes care of this. 
Task 23.14.34. Let M be an R-module, and suppose that M is an internal 
direct sum of two submodules MM, and Mg. Prove that M is finitely-generated 
iff both M; and Mp are finitely-generated. 


We are now ready to generalize Theorem 18.31. 


Theorem 23.155 (Fundamental Theorem of Finitely-Generated Abelian 
Groups). Every finitely-generated abelian group is a direct sum of cyclic 
groups. 


Task 23.14.35. Prove Theorem 23.155. 


In order to generalize Theorem 23.155 even further, to finitely generated 
modules over any PID, we must reconsider the proof techniques used in Chap- 
ter 18. There we looked at the cardinality of an abelian group and its prime 
factorization. In the more general case, we cannot do this: 


Task 23.14.36. Let R = Q[z] and M = R/(x? +1). Prove that R is a PID, 
M is a finitely-generated torsion R-module, and M is infinite. 

The key to generalizing the results of Chapter 18 from Z to any PID lies 
in replacing cardinalities with annihilators. 


Task 23.14.37. Let R be a PID, and let M be a non-zero finitely-generated 
torsion R-module. Prove that we have Annr(M) = (a) for some a € R— (0). 


Next we define the generalization of a p-group for modules over a PID. 


Definition 23.156. Let R be a PID, and let p be a non-zero prime ideal of R. 
A finitely generated R-module M is called a p-torsion module if Annr(M) = 
p® for some e € Z*. 


We also want to define the product of an ideal with a module, in order to 
mimic the construction of kG in Notation 18.14. 


Definition 23.157. Let M be an R-module and let A be an ideal of R. Then 
A-M denotes the submodule ({a:m : a€ Aandme M}). 


With the above definition in hand, we are ready to generalize Proposition 
18.12: 


Task 23.14.38. Let M be a non-zero finitely-generated torsion module over 
a PID R, and write Anng(M) = A = [Jj_, ps’ where p; are distinct prime 
ideals of R and e; € Zt. For each i, let A; = Hj«i p; and M; = A;:M. Prove 
that M; is a p;-torsion module, and that M = @f_, Mj. 

It remains to study the structure of p-torsion modules in order to generalize 


Proposition 18.23. The following task is a nice substitute for a consequence of 
Lagrange’s Theorem in this setting. 


364 Projects 


Task 23.14.39. Prove that a submodule of a p-torsion module is also a p-torsion 
module. 


Note that the ideal p° gets larger as the exponent e gets smaller; this 
accounts for the use of the word maximal in the following proposition. 


Proposition 23.158. Let M be a non-zero finitely-generated p-torsion mod- 
ule over a PID R, where p is a non-zero prime ideal of R. Let my,...,1™n be 
generators of M as an R-module such that [[;_, Annr(m;) is maximal. Then 
the natural map 0 : ®%4(mi) > M is an isomorphism. 

In particular, M is a direct sum of cyclic modules. 


Task 23.14.40. Prove Proposition 23.158. 
Putting our previous results together, we get: 


Theorem 23.159. Every finitely-generated module over a PID is a direct 
sum of cyclic modules. 


Task 23.14.41. Prove Theorem 23.159. 


Bibliography 


13 


14 


Tom M. Apostol. Introduction to Analytic Number Theory. Springer, 
1976. 


C. Bergstrom. Eigenfactor: measuring the value and prestige of scholarly 
journals. College and Research Libraries News, 68(5):314-316, 2007. 


David Eisenbud. Commutative Algebra with a View Toward Algebraic 
Geometry. Springer, 1995. 


Herbert B. Enderton. Elements of Set Theory. Academic Press, 1977. 


Joseph A. Gallian. Contemporary Abstract Algebra. Houghton Mifflin, 
sixth edition, 2006. 


Roe Goodman. Alice through Looking Glass after Looking Glass: The 
Mathematics of Mirrors and Kaleidoscopes. The American Mathematical 
Monthly, 111(4):281-298, 2004. 


L. J. Gray and D. G. Wilson. Nonnegative Factorization of Positive 
Semidefinite Nonnegative Matrices. Linear Algebra and Its Applications, 
31:119-127, 1980. 


Richard Hammack. Book of Proof. Richard Hammack, third edition, 
2018. 


Hubert Kiechle. Theory of K-Loops. Springer, 2002. 
Serge Lang. Algebra. Springer, third edition, 2002. 


Saunders Mac Lane. Categories for the Working Mathematician. 
Springer, second edition, 1998. 


Gregory H. Moore. Zermelo’s Axiom of Choice: Its Origins, Development, 
and Influence. Dover, 2013. 


Joseph J. Rotman. An Introduction to Homological Algebra. Springer, 
2009. 


L. V. Sabinin. On the gyrogroups of Ungar. Communications of the 
Moscow Mathematical Society, 50:1095-1096, 1995. 


365 


366 


15 


16 


17 


Bibliography 


Abraham A. Ungar. The holomorphic automorphism group of the com- 
plex disk. Aequationes Mathematicae, 47:240—254, 1994. 


Leonard M. Wapner. The Pea and the Sun: A Mathematical Paradoz. 
CRC Press, 2005. 


Charles A. Weibel. An Introduction to Homological Algebra. Cambridge 
University Press, 1994. 


Index 


p-group, 200 
p-Sylow subgroup, 217 


abelian group, 24 
abelianization of a group, 94 
adjoining elements to a field, 160 
algebraic 

closure, 299 

element, 155 

extension field, 155 
algebraically closed field, 219, 265 
algebraically independent set, 152 
alternating group, 282 
annihilator, 359 


array notation for a permutation, 27 


ascending chain condition, 230 
associates (in a ring), 229 
automorphism 

of groups, 86 

inner, 88 

of rings, 103 

as a symmetry, 88 
automorphism group of a field 

extension, 170 

Axiom of Choice, 10, 76, 1385, 298 


base field, 126 

basis 
ordered, 304 
standard, 315 

basis of a vector space, 130 
is a minimal spanning set, 132 
size invariance, 133 

bijective function, 7 

binary operation, 22 
associative, 22 
commutative, 23 


identity element for, 23 
inverse of an element with 
respect to, 23 
as “multiplication”, 22 
Binomial Theorem, 184 
block-diagonal, 325 


cancellation, 33 
left, 288 
Cartesian product, 4 
category, 300 
Cauchy’s Theorem, 216 
Cayley table, 26 
Cayley’s Theorem, 95 
center 
of a group, 88 
of a ring, 111 
centralizer of a group element, 45 
chain of ideals, 122 
change-of-basis formula, 305, 306 
characteristic 
of a field (classification), 183 
polynomial of a square matrix, 
316 
of a ring, 175 
subgroup, 205 
Chinese Remainder Theorem, 189 
closure 
under conjugation, 89 
under a group operation, 39 
under inverses, 39 
under ordinary addition and 
multiplication, 19 
under scalar multiplication, 128 
commutative 
binary operation, 23 
diagram, 59, 186, 190 


367 


368 


ring, 99 
commutator, 94 
complete ring, 332 
complex conjugation, 172, 258, 260 
componentwise operations, 127, 188 
composition 
of functions, 6 
of polynomials, 152 
congruence 
modulo an ideal, 253 
modulo an integer, 54, 68 
conjugacy class, 214, 220 
conjugate, 18 
as a change of basis, 306 
of a field element, 172 
of a group element, 63 
and irreducible polynomials, 182 
constructible 
angle, 249 
number, 243, 245 
formal definition, 247 
regular polygon, 250 
content of a polynomial, 236 
contrapositive, 10 
coordinate vector (with respect to a 
basis), 304 
coproduct, 190 
of abelian groups (=direct sum), 
192 
of arbitrary groups, 195 
coset, 65 
representative, 75 
cycle, 213, 220 
cyclic group, 42 
classification, 85 
subgroups of, 79 
cyclotomic field, 251 
generalization, 262 
cyclotomic polynomial, 251 
formula, 261 


degree of a polynomial, 144, 238 
derivative 

of a polynomial, 173 

of a power series, 334 


Index 


determinant, 308 

explicit formula, 311 
diagonal matrix, 309 
diagonalizable matrix, 314 
dihedral group, 49 

is determined by 3 relations, 97 
dimension 

of a ring, 122 

of a vector space, 135 
direct product 

of groups, 194 

of rings, 187, 188 
direct sum, 192, 195 
Dirichlet’s Theorem on primes in 

arithmetic progressions, 261 
disjoint, 4 
divisibility in a commutative ring, 
114 

domain (=integral domain), 102 
domain of a function, 5 
dot product (of vectors), 320 
doubling the cube, 248 


eigenbasis, 314 
eigenvalue, 314 
eigenvector, 314 
Eisenstein’s Criterion for 
irreducibility, 248 

elementary abelian p-group, 202 
embedding, 87, 103, 137 
empty 

product, 234 

set, 2 

sum, 116 

word, 55 
equivalence relation, 74 
Euclidean domain, 339 
Euler’s Theorem, 195 
exactly divides, 204 
exponent 

of a group, 202 

of a prime ideal, 234 
extension field, 154 


factorial, 35 


Index 


factoring 
a group homomorphism, 84, 89 
a polynomial completely, 159 
a ring homomorphism, 111 
Fermat prime, 255 
Fermat’s Little Theorem, 123 
Fibonacci sequence, 317 
field, 102 
cyclotomic, 251 
finite, 158 
fixed, 170 
of fractions, 228 
ground, 164 
of rational functions, 156, 228 
of scalars, 126 
splitting, 160 
finite extensions are algebraic, 165 
finite field, 158, 166, 184, 262 
fixed field, 170, 179, 184 
fraction, 224, see also localization 
free 
abelian group, 97 
group, 56 
function, 5 
identity, 6 
restriction, 7 
Fundamental Theorem 
of algebra, 219 
f arithmetic, 228 
f finite abelian groups, 207 
f Galois theory, 179 
f group homomorphisms, 83 
f module homomorphisms, 357 
f ring homomorphisms, 109 


oo0o0c0OlUmDlUlUlCO 


Galois 
field, 121 
field extension, 178 
group, 178 
Gauss’ Lemma, 236 
Gaussian integers, 340 
general linear group 
2 by 2 over R, 100 
general case, 139 
generator, 42 


369 


greatest common divisor (gcd), 2, 
232, 235 
ground field, 164 
group, 23 
abelian, 24 
alternating, 282 
Cayley table for, 26 
cyclic, 42 
dihedral, 49 
free, 56 
of prime order is cyclic, 77 
orthogonal, 324 
quotient, 67 
solvable, 271 
symmetric, 26 
trivial, 25 
group action, 211 


hashing (in computer science), 335 
homogeneous polynomial, 284 
homomorphism 

of groups, 57 

of rings, 103 

over a subring, 170 
of vector spaces, see linear 
transformation 


ideal, 105 
generated by a set, 116 
maximal, 120 
prime, 113 
principal, 117 
proper, 113 
ideal product, 233 
ideals 
of a field, 110 
of a quotient ring, 108 
image 
of an element, 6 
of a polynomial evaluation map, 
146 
of a set, 6 
index 
of a field extension, 163 
of nested field extensions, 163, 
165 


370 


of nested subgroups, 92 
of a subgroup, 76 
indexed collection, 9 
induced homomorphism, 84 
injective function, 7 
integral element, 241 
intersection, 4 
of normal subgroups is a normal 
subgroup, 91 
of subgroups is a subgroup, 43 
of two ideals is an ideal, 110 
inverse of a product, 36 
irreducible 
element, 114 
polynomial, 155 
isomorphism, 82, 103, 137 
class, 83 
is an equivalence relation, 83 
in a general category, 301 


kaleidoscope principle, 54, 180, 291 
kernel 
and injectivity, 87, 111 
of a group homomorphism, 60 
is a normal subgroup, 64 
of a ring homomorphism, 104 
is an ideal, 109 
Krull dimension, 122 


Lagrange’s Theorem, 77 
converse (for cyclic groups), 208 
converse (for finite abelian 
groups), 209 
converse (for p-groups), 215, 221 
Laurent series, 333 
law 
cancellation (in a group), 33 
of exponents (in a group), 32 
group (=group operation), 24 
of quadratic reciprocity, 353 
velocity addition (in special 
relativity), 36 
least common multiple (lcm), 2 
lift (from a quotient group), 71, 220 
limit (in a complete ring), 332 


Index 


linear combination 

of ring elements, 116 

of vectors, 129 

linear transformation, 137 
linearly independent, 132 
localization of a domain, 225 
at one element, 333 
logarithm (as a group 
homomorphism), 61 


map, mapping (=function), 6 
matrix 
diagonal, 309 
diagonalizable, 314 
of a linear transformation, 138 
scalar, 139 
transpose of, 313 
triangular, 313 
maximal ideal, 120 
is prime, 121 
Mersenne prime, 349 
module, 355 
monic polynomial, 148 
morphism, 190, 300 
multiplicatively closed subset of a 
ring, 224 
multiplicity of a root, 174 


Nakayama’s Lemma, 208 
natural map, 69, 109, 175, 193, 225, 
357 
noetherian condition, 230 
non-singular matrix, 310 
norm map, 260 
normal closure, 183 
normal field extension, 176 
intransitivity, 184 
normal subgroup, 64 
generated by a set, 92 
normalizer of a subgroup, 71 
number game 
on finite extensions of Q, 184 
on R = Z[V2], 16, 111, 172 
on a subring of C, 104 
on Z, 15, 16, 111 


Index 


object, 190, 300 

operator notation, 22 

orbit, 212 

Orbit-Stabilizer Lemma, 212 

order of an element 
in a direct sum of groups, 209 
in a group, 77 

ordered basis, 304 

orthogonal group, 324 


partial fractions (in calculus), 151 
partition, 73 
permutation of a set, 26 
sign of, 311 
polynomial, 141 
characteristic, 316 
cyclotomic, 251 
degree of, 144, 238 
evaluation, 145 
formal definition, 142 
homogeneous, 284 
monic, 148 
root, 149 
solvable (by radicals), 265 
polynomial ring, 141 
over a field is a vector space, 151 
in several variables, 152, 238 
over a UFD is a UFD, 237 
poset, 296 
power series, 331 
prime 
element, 114 
generates prime ideal, 123 
is irreducible, 123 
ideal, 113 
integer, 2 
subfield, 184 
primitive root of unity, 250 
principal ideal, 117 


principal ideal domain (PID), 119, 
148 
product rule for derivatives, 173 
proof, 8 
proper 
ideal, 113 


371 
subset, 1 


quadratic 
extension field, 166, 245, 247 
probing function, 336 
reciprocity law, 353 

quotient 
group, 67 

size of (in a finite group), 77 

ring, 106 

Quotient-Remainder Theorem, 147 


radical 
element, 264 
expression, 264 
Rational Root Theorem (in a general 
UFD), 259 
regular polygon, 249 
relation 
in a group, 51 
standard form of, 60 
on a set, 74 
relatively prime, 2 
resultant of two polynomials, 346, 
347 
ring, 99 
commutative, 99 
of Gaussian integers, 340 
of square matrices over a field, 
139 
with 1, 99 
root bound on polynomials, 150 
violation of, 151 
root of a polynomial, 149 
existence of, 158 
repeated, 174 
Root-Factor Theorem, 147, 149 


scalar, 126 
separable 
element, 175 
field extension, 175 
set, 1 
empty, 2 
index, 9 
setwise product, 71, 197 


372 


sign of a permutation, 311 
similar matrices, 306 
simple field extension, 160 
singular matrix, 310 
solvable 
extension field, 266 
group, 271 
polynomial, 265 
splits completely, 159 
splitting field, 160 
explicit description, 161 
uniqueness up to isomorphism, 
183 
squarefree, 209 
stabilizer, 212 
standard basis, 315 
subfield, 154 
subgroup, 38 
characteristic, 205 
commutator, 94 
generated by a set, 41 
index of, 76 
of index 2 is normal, 78 
must contain e, 38 
normal, 64 
test, 39 
subring, 103 
subspace (of a vector space), 128 
spanned by a set, 128 
test, 128 
superfield, 154 
surjective function, 7 
Sylow’s Theorem, 216, 221 
symmetric 
functions, 281 
matrix, 313 
symmetry, 47 
rigid, 48 


term of a polynomial, 142 

Theorem Q, 336 

To divide is to contain, 121, 123, 229 
torsion, 361 

totient function (of Euler), 194, 208 
tower of fields, 162 


Index 


transcendental element, 155 
existence (in R), 167 
transcendental function, 334 
transpose of a matrix, 313 
transposition, 281 
triangular matrix, 313 
trisecting an angle, 248, 249 
two-thirds rule 
for group elements, 44 
for polynomials over a field, 152 


unique factorization domain (UFD), 
229 
uniqueness 
of identity element, 30 
of inverse element, 30 
unit (of a ring), 99 
units 
of a direct product, 194 
of a polynomial ring over a 
domain, 144 
of a power series ring, 333 
of R= Z[V2], 111 
of Z/nZ, 123 
universal property 
of a direct product, 186 
of localization, 226, 238 


vector 
formal definition, 126 
informal definition, 126 
vector space, 126 


word, 55 


zerodivisor, 113 
Zorn’s Lemma, 297 


